litert

2 posts

google

XR Blocks: Accelerating AI + XR innovation (opens in new tab)

XR Blocks is an open-source, cross-platform framework designed to bridge the technical gap between mature AI development ecosystems and high-friction extended reality (XR) prototyping. By providing a modular architecture and high-level abstractions, the toolkit enables creators to rapidly build and deploy intelligent, immersive web applications without managing low-level system integration. Ultimately, the framework empowers developers to move from concept to interactive prototype across both desktop simulators and mobile XR devices using a unified codebase. ### Core Design Principles * **Simplicity and Readability:** Drawing inspiration from the "Zen of Python," the framework prioritizes human-readable abstractions where a developer’s script reflects a high-level description of the experience rather than complex boilerplate code. * **Creator-Centric Workflow:** The architecture is designed to handle the "plumbing" of XR—such as sensor fusion, AI model integration, and cross-platform logic—allowing creators to focus entirely on user interaction and experience. * **Pragmatic Modularity:** Rather than attempting to be a perfect, all-encompassing system, XR Blocks favors an adaptable and simple architecture that can evolve alongside the rapidly changing fields of AI and spatial computing. ### The Reality Model Abstractions * **The Script Primitive:** Acts as the logical center of an application, separating the "what" of an interaction from the "how" of its underlying technical implementation. * **User and World:** Provides built-in support for tracking hands, gaze, and avatars while allowing the system to query the physical environment for depth, estimated lighting conditions, and object recognition. * **AI and Agents:** Facilitates the integration of intelligent assistants, such as the "Sensible Agent," which can provide proactive, context-aware suggestions within the XR environment. * **Virtual Interfaces:** Offers tools to augment blended reality with virtual UI elements that respond to the user's physical context. ### Technical Implementation and Integration * **Web-Based Foundation:** The framework is built upon accessible, standard technologies including WebXR, three.js, and LiteRT (formerly TFLite) to ensure a low barrier to entry for web developers. * **Advanced AI Support:** It features native integration with Gemini for high-level reasoning and context-aware applications. * **Cross-Platform Deployment:** Developers can prototype depth-aware, physics-based interactions in a desktop simulator and deploy the exact same code to Android XR devices. * **Open-Source Resources:** The project includes a comprehensive suite of templates and live demos covering specific use cases like depth mapping, gesture modeling, and lighting estimation. By lowering the barrier to entry for intelligent XR development, XR Blocks serves as a practical starting point for researchers and developers aiming to explore the next generation of human-centered computing. Interested creators can access the source code on GitHub to begin building immersive, AI-driven applications that function seamlessly across the web and specialized XR hardware.

google

Introducing interactive on-device segmentation in Snapseed (opens in new tab)

Google has introduced a new "Object Brush" feature in Snapseed that enables intuitive, real-time selective photo editing through a novel on-device segmentation technology. By leveraging a high-performance interactive AI model, users can isolate complex subjects with simple touch gestures in under 20 milliseconds, bridging the gap between professional-grade editing and mobile convenience. This breakthrough is achieved through a sophisticated teacher-student training architecture that prioritizes both pixel-perfect accuracy and low-latency performance on consumer hardware. ### High-Performance On-Device Inference * The system is powered by the Interactive Segmenter model, which is integrated directly into the Snapseed "Adjust" tool to facilitate immediate object-based modifications. * To ensure a fluid user experience, the model utilizes the MediaPipe framework and LiteRT’s GPU acceleration to process selections in less than 20ms. * The interface supports dynamic refinement, allowing users to provide real-time feedback by tracing lines or tapping to add or subtract specific areas of an image. ### Teacher-Student Model Distillation * The development team first created "Interactive Segmenter: Teacher," a large-scale model fine-tuned on 30,000 high-quality, pixel-perfect manual annotations across more than 350 object categories. * Because the Teacher model’s size and computational requirements are prohibitive for mobile use, researchers developed "Interactive Segmenter: Edge" through knowledge distillation. * This distillation process utilized a dataset of over 2 million weakly annotated images, allowing the smaller Edge model to inherit the generalization capabilities of the Teacher model while maintaining a footprint suitable for mobile devices. ### Training via Synthetic User Prompts * To make the model universally capable across all object types, the training process uses a class-agnostic approach based on the Big Transfer (BiT) strategy. * The model learns to interpret user intent through "prompt generation," which simulates real-world interactions such as random scribbles, taps, and lasso (box) selections. * During training, both the Teacher and Edge models receive identical prompts—such as red foreground scribbles and blue background scribbles—to ensure the student model learns to produce high-quality masks even from imprecise user input. This advancement significantly lowers the barrier to entry for complex photo manipulation by moving heavy-duty AI processing directly onto the mobile device. Users can expect a more responsive and precise editing experience that handles everything from fine-tuning a subject's lighting to isolating specific environmental elements like clouds or clothing.