on-device-ai

4 posts

google

Introducing interactive on-device segmentation in Snapseed (opens in new tab)

Google has introduced a new "Object Brush" feature in Snapseed that enables intuitive, real-time selective photo editing through a novel on-device segmentation technology. By leveraging a high-performance interactive AI model, users can isolate complex subjects with simple touch gestures in under 20 milliseconds, bridging the gap between professional-grade editing and mobile convenience. This breakthrough is achieved through a sophisticated teacher-student training architecture that prioritizes both pixel-perfect accuracy and low-latency performance on consumer hardware. ### High-Performance On-Device Inference * The system is powered by the Interactive Segmenter model, which is integrated directly into the Snapseed "Adjust" tool to facilitate immediate object-based modifications. * To ensure a fluid user experience, the model utilizes the MediaPipe framework and LiteRT’s GPU acceleration to process selections in less than 20ms. * The interface supports dynamic refinement, allowing users to provide real-time feedback by tracing lines or tapping to add or subtract specific areas of an image. ### Teacher-Student Model Distillation * The development team first created "Interactive Segmenter: Teacher," a large-scale model fine-tuned on 30,000 high-quality, pixel-perfect manual annotations across more than 350 object categories. * Because the Teacher model’s size and computational requirements are prohibitive for mobile use, researchers developed "Interactive Segmenter: Edge" through knowledge distillation. * This distillation process utilized a dataset of over 2 million weakly annotated images, allowing the smaller Edge model to inherit the generalization capabilities of the Teacher model while maintaining a footprint suitable for mobile devices. ### Training via Synthetic User Prompts * To make the model universally capable across all object types, the training process uses a class-agnostic approach based on the Big Transfer (BiT) strategy. * The model learns to interpret user intent through "prompt generation," which simulates real-world interactions such as random scribbles, taps, and lasso (box) selections. * During training, both the Teacher and Edge models receive identical prompts—such as red foreground scribbles and blue background scribbles—to ensure the student model learns to produce high-quality masks even from imprecise user input. This advancement significantly lowers the barrier to entry for complex photo manipulation by moving heavy-duty AI processing directly onto the mobile device. Users can expect a more responsive and precise editing experience that handles everything from fine-tuning a subject's lighting to isolating specific environmental elements like clouds or clothing.

google

Synthetic and federated: Privacy-preserving domain adaptation with LLMs for mobile applications (opens in new tab)

Researchers at Google have developed a framework for improving both small and large language models (LMs) in mobile applications like Gboard by utilizing privacy-preserving synthetic data and federated learning. This approach combines differential privacy (DP) with large language model (LLM) generation to minimize data memorization risks while achieving significant gains in production metrics like next-word prediction and proofreading. The result is a robust pipeline that allows models to adapt to specific user domains without compromising individual privacy or requiring centralized data storage. ### Strengthening Privacy with DP-FL * Gboard has transitioned all production LMs trained on user data to a Federated Learning with Differential Privacy (DP-FL) framework, ensuring data remains on-device and is never memorized. * The deployment utilizes the **BLT-DP-FTRL** algorithm, which offers an optimized trade-off between privacy guarantees and model utility while being easier to deploy in production. * Engineers adopted the **SI-CIFG** model architecture to facilitate efficient on-device training, ensuring the hardware can handle local updates while maintaining compatibility with DP constraints. ### Synthetic Data Generation via Public LLMs * Powerful LLMs trained on public web data are prompted to synthesize high-quality text that mimics mobile user interactions without ever accessing actual private user data. * The process involves a two-step prompting strategy: first, filtering public datasets to identify topics common in mobile communication, and second, generating new, domain-specific text based on those patterns. * This synthetic data serves as a bridge for pre-training small LMs, which are then refined through private post-training on-device to capture the nuances of user behavior. ### Adapting LLMs for Mobile Proofreading * To support advanced features like Gboard's "Proofread," researchers developed a "Synthesize-then-Adapt" pipeline specifically for error correction. * LLMs generate synthetic "corrupted" text to simulate common mobile typing errors, providing the necessary training pairs (error/correction) that are difficult to find in public datasets. * Federated learning is then used to adapt these error-correction models to specific app domains (such as messaging or email) using on-device signals, ensuring the model understands the specific context of the user's typing. The success of these techniques in Gboard demonstrates that synthetic data can effectively replace or augment private data throughout the machine learning lifecycle. For developers working with sensitive user information, adopting a "synthetic-first" approach combined with federated learning provides a scalable path to model improvement that adheres to the core principles of data minimization and anonymization.

google

Google Research at Google I/O 2025 (opens in new tab)

Google Research at I/O 2025 showcases the "research to reality" transition, highlighting how years of foundational breakthroughs are now being integrated into Gemini models and specialized products. By focusing on multimodal capabilities, pedagogy, and extreme model efficiency, Google aims to democratize access to advanced AI while ensuring it remains grounded and useful across global contexts. ## Specialized Healthcare Models: MedGemma and AMIE * **MedGemma:** This new open model, based on Gemma 3, is optimized for multimodal medical tasks such as radiology image analysis and clinical data summarization. It is available in 4B and 27B sizes, performing similarly to much larger models on the MedQA benchmark while remaining small enough for efficient local fine-tuning. * **AMIE (Articulate Medical Intelligence Explorer):** A research AI agent designed for diagnostic medical reasoning. Its latest multimodal version can now interpret and reason about visual medical information, such as skin lesions or medical imaging, to assist clinicians in diagnostic accuracy. ## Educational Optimization through LearnLM * **Gemini 2.5 Pro Integration:** The LearnLM family of models, developed with educational experts, is now integrated into Gemini 2.5 Pro. This fine-tuning enhances STEM reasoning, multimodal understanding, and pedagogical feedback. * **Interactive Learning Tools:** A new research-optimized quiz experience allows students to generate custom assessments from their own notes, providing specific feedback on right and wrong answers rather than just providing solutions. * **Global Assessment Pilots:** Through partnerships like the one with Kayma, Google is testing the automatic assessment of short and long-form content in regions like Ghana to scale quality educational tools. ## Multilingual Expansion and On-Device Gemma Models * **Gemma 3 and 3n:** Research breakthroughs have expanded Gemma 3’s support to over 140 languages. The introduction of **Gemma 3n** targets extreme efficiency, capable of running on devices with as little as 2GB of RAM while maintaining low latency and low energy consumption. * **ECLeKTic Benchmark:** To assist the developer community, Google introduced this novel benchmark specifically for evaluating how well large language models transfer knowledge across different languages. ## Model Efficiency and Factuality in Search * **Inference Techniques:** Google Research continues to set industry standards for model speed and accessibility through technical innovations like **speculative decoding** and **cascades**, which reduce the computational cost of generating high-quality responses. * **Grounded Outputs:** Significant focus remains on factual consistency, ensuring that the AI models powering features like AI Overviews in Search provide reliable and grounded information to users. As Google continues to shrink the gap between laboratory breakthroughs and consumer products, the emphasis remains on making high-performance AI accessible on low-cost hardware and across diverse linguistic landscapes. Developers and researchers can now leverage these specialized tools via platforms like HuggingFace and Vertex AI to build more targeted, efficient applications.