google

Introducing interactive on-device segmentation in Snapseed (opens in new tab)

Google has introduced a new "Object Brush" feature in Snapseed that enables intuitive, real-time selective photo editing through a novel on-device segmentation technology. By leveraging a high-performance interactive AI model, users can isolate complex subjects with simple touch gestures in under 20 milliseconds, bridging the gap between professional-grade editing and mobile convenience. This breakthrough is achieved through a sophisticated teacher-student training architecture that prioritizes both pixel-perfect accuracy and low-latency performance on consumer hardware.

High-Performance On-Device Inference

  • The system is powered by the Interactive Segmenter model, which is integrated directly into the Snapseed "Adjust" tool to facilitate immediate object-based modifications.
  • To ensure a fluid user experience, the model utilizes the MediaPipe framework and LiteRT’s GPU acceleration to process selections in less than 20ms.
  • The interface supports dynamic refinement, allowing users to provide real-time feedback by tracing lines or tapping to add or subtract specific areas of an image.

Teacher-Student Model Distillation

  • The development team first created "Interactive Segmenter: Teacher," a large-scale model fine-tuned on 30,000 high-quality, pixel-perfect manual annotations across more than 350 object categories.
  • Because the Teacher model’s size and computational requirements are prohibitive for mobile use, researchers developed "Interactive Segmenter: Edge" through knowledge distillation.
  • This distillation process utilized a dataset of over 2 million weakly annotated images, allowing the smaller Edge model to inherit the generalization capabilities of the Teacher model while maintaining a footprint suitable for mobile devices.

Training via Synthetic User Prompts

  • To make the model universally capable across all object types, the training process uses a class-agnostic approach based on the Big Transfer (BiT) strategy.
  • The model learns to interpret user intent through "prompt generation," which simulates real-world interactions such as random scribbles, taps, and lasso (box) selections.
  • During training, both the Teacher and Edge models receive identical prompts—such as red foreground scribbles and blue background scribbles—to ensure the student model learns to produce high-quality masks even from imprecise user input.

This advancement significantly lowers the barrier to entry for complex photo manipulation by moving heavy-duty AI processing directly onto the mobile device. Users can expect a more responsive and precise editing experience that handles everything from fine-tuning a subject's lighting to isolating specific environmental elements like clouds or clothing.