on-device-ai

2 posts

google

Introducing interactive on-device segmentation in Snapseed (opens in new tab)

Google has introduced a new "Object Brush" feature in Snapseed that enables intuitive, real-time selective photo editing through a novel on-device segmentation technology. By leveraging a high-performance interactive AI model, users can isolate complex subjects with simple touch gestures in under 20 milliseconds, bridging the gap between professional-grade editing and mobile convenience. This breakthrough is achieved through a sophisticated teacher-student training architecture that prioritizes both pixel-perfect accuracy and low-latency performance on consumer hardware. ### High-Performance On-Device Inference * The system is powered by the Interactive Segmenter model, which is integrated directly into the Snapseed "Adjust" tool to facilitate immediate object-based modifications. * To ensure a fluid user experience, the model utilizes the MediaPipe framework and LiteRT’s GPU acceleration to process selections in less than 20ms. * The interface supports dynamic refinement, allowing users to provide real-time feedback by tracing lines or tapping to add or subtract specific areas of an image. ### Teacher-Student Model Distillation * The development team first created "Interactive Segmenter: Teacher," a large-scale model fine-tuned on 30,000 high-quality, pixel-perfect manual annotations across more than 350 object categories. * Because the Teacher model’s size and computational requirements are prohibitive for mobile use, researchers developed "Interactive Segmenter: Edge" through knowledge distillation. * This distillation process utilized a dataset of over 2 million weakly annotated images, allowing the smaller Edge model to inherit the generalization capabilities of the Teacher model while maintaining a footprint suitable for mobile devices. ### Training via Synthetic User Prompts * To make the model universally capable across all object types, the training process uses a class-agnostic approach based on the Big Transfer (BiT) strategy. * The model learns to interpret user intent through "prompt generation," which simulates real-world interactions such as random scribbles, taps, and lasso (box) selections. * During training, both the Teacher and Edge models receive identical prompts—such as red foreground scribbles and blue background scribbles—to ensure the student model learns to produce high-quality masks even from imprecise user input. This advancement significantly lowers the barrier to entry for complex photo manipulation by moving heavy-duty AI processing directly onto the mobile device. Users can expect a more responsive and precise editing experience that handles everything from fine-tuning a subject's lighting to isolating specific environmental elements like clouds or clothing.

google

Google Research at Google I/O 2025 (opens in new tab)

Google Research at I/O 2025 showcases the "research to reality" transition, highlighting how years of foundational breakthroughs are now being integrated into Gemini models and specialized products. By focusing on multimodal capabilities, pedagogy, and extreme model efficiency, Google aims to democratize access to advanced AI while ensuring it remains grounded and useful across global contexts. ## Specialized Healthcare Models: MedGemma and AMIE * **MedGemma:** This new open model, based on Gemma 3, is optimized for multimodal medical tasks such as radiology image analysis and clinical data summarization. It is available in 4B and 27B sizes, performing similarly to much larger models on the MedQA benchmark while remaining small enough for efficient local fine-tuning. * **AMIE (Articulate Medical Intelligence Explorer):** A research AI agent designed for diagnostic medical reasoning. Its latest multimodal version can now interpret and reason about visual medical information, such as skin lesions or medical imaging, to assist clinicians in diagnostic accuracy. ## Educational Optimization through LearnLM * **Gemini 2.5 Pro Integration:** The LearnLM family of models, developed with educational experts, is now integrated into Gemini 2.5 Pro. This fine-tuning enhances STEM reasoning, multimodal understanding, and pedagogical feedback. * **Interactive Learning Tools:** A new research-optimized quiz experience allows students to generate custom assessments from their own notes, providing specific feedback on right and wrong answers rather than just providing solutions. * **Global Assessment Pilots:** Through partnerships like the one with Kayma, Google is testing the automatic assessment of short and long-form content in regions like Ghana to scale quality educational tools. ## Multilingual Expansion and On-Device Gemma Models * **Gemma 3 and 3n:** Research breakthroughs have expanded Gemma 3’s support to over 140 languages. The introduction of **Gemma 3n** targets extreme efficiency, capable of running on devices with as little as 2GB of RAM while maintaining low latency and low energy consumption. * **ECLeKTic Benchmark:** To assist the developer community, Google introduced this novel benchmark specifically for evaluating how well large language models transfer knowledge across different languages. ## Model Efficiency and Factuality in Search * **Inference Techniques:** Google Research continues to set industry standards for model speed and accessibility through technical innovations like **speculative decoding** and **cascades**, which reduce the computational cost of generating high-quality responses. * **Grounded Outputs:** Significant focus remains on factual consistency, ensuring that the AI models powering features like AI Overviews in Search provide reliable and grounded information to users. As Google continues to shrink the gap between laboratory breakthroughs and consumer products, the emphasis remains on making high-performance AI accessible on low-cost hardware and across diverse linguistic landscapes. Developers and researchers can now leverage these specialized tools via platforms like HuggingFace and Vertex AI to build more targeted, efficient applications.