toss

Creating the worst experience at Toss (opens in new tab)

Toss designer Lee Hyeon-jeong argues that business goals and user experience are not mutually exclusive, even when integrating controversial elements like advertising. By identifying the intersection between monetization and usability, her team transformed intrusive ads into value-driven features that maintain user trust while driving significant revenue. The ultimate conclusion is that transparency and appropriate rewards can mitigate negative feedback and even increase user engagement. ### Reducing Friction through Predictability and Placement * Addressed "surprise" ads by introducing clear labeling, such as "Watch Ad" buttons or specifying ad durations (e.g., "30-second ad"), which reduced negative sentiment without decreasing revenue. * Discovered that when users are given a choice and clear expectations, their anxiety decreases and their willingness to engage with the content increases. * Eliminated "flow-breaking" ads that mimicked functional UI elements, such as banners placed inside transaction histories that users frequently mistook for personal bank records. * Established a design principle to place advertisements only in areas that do not interfere with information discovery or core user navigation tasks. ### Transforming Advertisements into User Benefits * Developed a dedicated B2B ad platform to scale the variety of available advertisements, ensuring that users receive ads relevant to their specific life stages, such as car insurance or new credit cards. * Shifted the internal perception of ads from "noise" to "benefits" by focusing on the right timing and high-quality matching between the advertiser and the user's needs. * Institutionalized regular "creative ideation sessions" to explore interactive formats, including advertisements that respond to phone movement (gyroscope), quizzes, and mini-games. * Leveraged long-term internal experiments to ensure that even if an idea cannot be implemented immediately, it remains in the team's "creative bank" for future product opportunities. ### Optimizing Value Exchange through Rewards * Conducted over a year of A/B testing on reward thresholds, comparing small cash amounts (1 KRW to 200 KRW), non-monetary items (gifticons), and high-stakes lottery-style prizes. * Analyzed the "labor intensity" of ads by adjusting lengths (10 to 30 seconds) to find the psychological tipping point where users felt the reward was worth their time. * Implemented a high-value lottery system within the Toss Pedometer service, which successfully transitioned a loss-making feature into a profitable revenue stream. * Maintained user activity and satisfaction levels despite the increased presence of ads by ensuring the "worst-case experience"—viewing ads for no gain—was entirely avoided. Product teams should stop viewing business requirements and UX as a zero-sum game. By focusing on user psychology—specifically transparency, non-disruption, and fair value exchange—it is possible to achieve aggressive business targets while maintaining a sustainable and trusted user environment.

google

DS-STAR: A state-of-the-art versatile data science agent (opens in new tab)

DS-STAR is an advanced autonomous data science agent developed to handle the complexity and heterogeneity of real-world data tasks, ranging from statistical analysis to visualization. By integrating a specialized file analysis module with an iterative planning and verification loop, the system can interpret unstructured data and refine its reasoning steps dynamically based on execution feedback. This architecture allows DS-STAR to achieve state-of-the-art performance on major industry benchmarks, effectively bridging the gap between natural language queries and executable, verified code. ## Comprehensive Data File Analysis The framework addresses a major limitation of current agents—the over-reliance on structured CSV files—by implementing a dedicated analysis stage for diverse data formats. * The system automatically scans a directory to extract context from heterogeneous formats, including JSON, unstructured text, and markdown files. * A Python-based analysis script generates a textual summary of the data structure and content, which serves as the foundational context for the planning phase. * This module ensures the agent can navigate complex, multi-file environments where critical information is often spread across non-relational sources. ## Iterative Planning and Verification Architecture DS-STAR utilizes a sophisticated loop involving four specialized roles to mimic the workflow of a human expert conducting sequential analysis. * **Planner and Coder:** A Planner agent establishes high-level objectives, which a Coder agent سپس translates into executable Python scripts. * **LLM-based Verification:** A Verifier agent acts as a judge, assessing whether the generated code and its output are sufficient to solve the problem or if the reasoning is flawed. * **Dynamic Routing:** If the Verifier identifies gaps, a Router agent guides the refinement process by adding new steps or correcting errors, allowing the cycle to repeat for up to 10 rounds. * **Intermediate Review:** The agent reviews intermediate results before proceeding to the next step, similar to how data scientists use interactive environments like Google Colab. ## Benchmarking and State-of-the-Art Performance The effectiveness of the DS-STAR framework was validated through rigorous testing against existing agents like AutoGen and DA-Agent. * The agent secured the top rank on the public DABStep leaderboard, raising accuracy from 41.0% to 45.2% compared to previous best-performing models. * Performance gains were consistent across other benchmarks, including KramaBench (39.8% to 44.7%) and DA-Code (37.0% to 38.5%). * DS-STAR showed a significant advantage in "hard" tasks—those requiring the synthesis of information from multiple, varied data sources—demonstrating its superior versatility in complex environments. By automating the time-intensive tasks of data wrangling and verification, DS-STAR provides a robust template for the next generation of AI assistants. Organizations looking to scale their data science capabilities should consider adopting iterative agentic workflows that prioritize multi-format data understanding and self-correcting execution loops.

netflix

Netflix's Metaflow Spin: Faster ML Development | Netflix TechBlog (opens in new tab)

Netflix has introduced Spin, a new functionality within the Metaflow framework designed to significantly accelerate the iterative development cycle for ML and AI workflows. By bridging the gap between the interactive speed of notebooks and the production-grade reliability of versioned workflows, Spin allows developers to experiment with stateful increments without the latency of full restarts. This enhancement ensures that the "prototype to production" pipeline remains fluid while maintaining the deterministic execution and explicit state management that Metaflow provides at scale. ### The Nature of ML and AI Iteration * ML and AI development is distinct from traditional software engineering because it involves large, mutable datasets and computationally expensive, stochastic processes. * State management is a primary concern in this domain, as reloading data or recomputing transformations for every minor code change creates a prohibitively slow feedback loop. * While notebooks like Jupyter or Marimo excel at preserving in-memory state for fast exploration, they often lead to "hidden state" problems and non-deterministic results due to out-of-order cell execution. ### Metaflow as a State-Aware Framework * Metaflow uses the `@step` decorator to define checkpoint boundaries where the framework automatically persists all instance variables as versioned artifacts. * The framework’s `resume` command allows developers to restart execution from a specific step, cloning previous state to avoid recomputing successful upstream tasks. * This architecture addresses notebook limitations by ensuring execution order is explicit and deterministic while making the state fully discoverable and versioned. ### Introducing Spin for Rapid Development * Spin is a new feature introduced in Metaflow 2.19 that further reduces the friction of the iterative development loop. * It aims to provide the near-instant feedback of a notebook environment while operating within the structure of a production-ready Metaflow workflow. * The tool helps developers manage the stateful nature of ML development, allowing for quick, incremental experimentation without losing continuity between code iterations. To improve data science productivity and reduce "waiting time" during the development phase, engineering teams should look to adopt Metaflow 2.19 and integrate Spin into their experimentation workflows.

google

Forecasting the future of forests with AI: From counting losses to predicting risk (opens in new tab)

Research from Google DeepMind and Google Research introduces ForestCast, a deep learning-based framework designed to transition forest management from retrospective loss monitoring to proactive risk forecasting. By utilizing vision transformers and pure satellite data, the team has developed a scalable method to predict future deforestation that matches or exceeds the accuracy of traditional models dependent on inconsistent manual inputs. This approach provides a repeatable, future-proof benchmark for protecting biodiversity and mitigating climate change on a global scale. ### Limitations of Traditional Forecasting * Existing state-of-the-art models rely on specialized geospatial maps, such as infrastructure development, road networks, and regional economic indicators. * These traditional inputs are often "patchy" and inconsistent across different countries, requiring manual assembly that is difficult to replicate globally. * Manual data sources are not future-proof; they tend to go out of date quickly with no guarantee of regular updates, unlike continuous satellite streams. ### A Scalable Pure-Satellite Architecture * The ForestCast model adopts a "pure satellite" approach, using only raw inputs from Landsat and Sentinel-2 satellites. * The architecture is built on vision transformers (ViTs) that process an entire tile of pixels in a single pass to capture critical spatial context and landscape-level trends. * The model incorporates a satellite-derived "change history" layer, which identifies previously deforested pixels and the specific year the loss occurred. * By avoiding socio-political or infrastructure maps, the method can be applied consistently to any region on Earth, allowing for meaningful cross-regional comparisons. ### Key Findings and Benchmark Release * Research indicates that "change history" is the most information-dense input; a model trained on this data alone performs almost as well as those using raw multi-spectral data. * The model successfully predicts tile-to-tile variation in deforestation amounts and identifies the specific pixels most likely to be cleared next. * Google has released the training and evaluation data as a public benchmark dataset, focusing initially on Southeast Asia to allow the machine learning community to verify and improve upon the results. The release of ForestCast provides a template for scaling predictive modeling to Latin America, Africa, and boreal latitudes. Conservationists and policymakers should utilize these forecasting tools to move beyond counting historical losses and instead direct resources toward "frontline" areas where the model identifies imminent risk of habitat conversion.

line

Security threats and countermeasures in AI (opens in new tab)

Developing AI products introduces unique security vulnerabilities that extend beyond traditional software risks, ranging from package hallucinations to sophisticated indirect prompt injections. To mitigate these threats, organizations must move away from trusting LLM-generated content and instead implement rigorous validation, automated threat modeling, and input/output guardrails. The following summary details the specific risks and mitigation strategies identified by LY Corporation’s security engineering team. ## Slopsquatting and Package Hallucinations - AI models frequently hallucinate non-existent library or package names when providing coding instructions (e.g., suggesting `huggingface-cli` instead of the correct `huggingface_hub[cli]`). - Attackers exploit this by registering these hallucinated names on public registries to distribute malware to unsuspecting developers. - Mitigation requires developers to manually verify all AI-suggested commands and dependencies before execution in any environment. ## Prompt Injection and Arbitrary Code Execution - As seen in CVE-2024-5565 (Vanna AI), attackers can inject malicious instructions into prompts to force the application to execute arbitrary code. - This vulnerability arises when developers grant LLMs the autonomy to generate and run logic within the application context without sufficient isolation. - Mitigation involves treating LLM outputs as untrusted data, sanitizing user inputs, and strictly limiting the LLM's ability to execute system-level commands. ## Indirect Prompt Injection in Integrated AI - AI assistants integrated into office environments (like Gemini for Workspace) are susceptible to indirect prompt injections hidden within emails or documents. - A malicious email can contain "system-like" instructions that trick the AI into hiding content, redirecting users to phishing sites, or leaking data from other files. - Mitigation requires the implementation of robust guardrails that scan both the input data (the content being processed) and the generated output for instructional anomalies. ## Permission Risks in AI Agents and MCP - The use of Model Context Protocol (MCP) and coding agents creates risks where an agent might overstep its intended scope. - If an agent has broad access to a developer's environment, a malicious prompt in a public repository could trick the agent into accessing or leaking sensitive data (such as salary info or private keys) from a private repository. - Mitigation centers on the principle of least privilege, ensuring AI agents are restricted to specific, scoped directories and repositories. ## Embedding Inversion and Vector Store Vulnerabilities - Attacks targeting the retrieval phase of RAG (Retrieval-Augmented Generation) systems can lead to data leaks. - Embedding Inversion techniques may allow attackers to reconstruct original sensitive text from the vector embeddings stored in a database. - Securing AI products requires protecting the integrity of the vector store and ensuring that retrieved context does not bypass security filters. ## Automated Security Assessment Tools - To scale security, LY Corporation is developing internal tools like "ConA" for automated threat modeling and "LAVA" for automated vulnerability assessment. - These tools aim to identify AI-specific risks during the design and development phases rather than relying solely on manual reviews. Effective AI security requires a shift in mindset: treat every LLM response as a potential security risk. Developers should adopt automated threat modeling and implement strict input/output validation layers to protect both the application infrastructure and user data from evolving AI-based exploits.

google

Exploring a space-based, scalable AI infrastructure system design (opens in new tab)

Project Suncatcher is a Google moonshot initiative aimed at scaling machine learning infrastructure by deploying solar-powered satellite constellations equipped with Tensor Processing Units (TPUs). By leveraging the nearly continuous energy of the sun in specific orbits and utilizing high-bandwidth free-space optical links, the project seeks to bypass the resource constraints of terrestrial data centers. Early research suggests that a modular, tightly clustered satellite design can achieve the necessary compute density and communication speeds required for modern AI workloads. ### Data-Center Bandwidth via Optical Links * To match terrestrial performance, inter-satellite links must support tens of terabits per second using multi-channel dense wavelength-division multiplexing (DWDM) and spatial multiplexing. * The system addresses signal power loss (the link budget) by maintaining satellites in extremely close proximity—kilometers or less—compared to traditional long-range satellite deployments. * Initial bench-scale demonstrations have successfully achieved 800 Gbps each-way transmission (1.6 Tbps total) using a single transceiver pair, validating the feasibility of high-speed optical networking. ### Orbital Mechanics of Compact Constellations * The proposed system utilizes a sun-synchronous low-earth orbit (LEO) at an altitude of approximately 650 km to maximize solar exposure and minimize the weight of onboard batteries. * Researchers use Hill-Clohessy-Wiltshire equations and JAX-based differentiable models to manage the complex gravitational perturbations and atmospheric drag affecting satellites flying in tight 100–200m formations. * Simulations of 81-satellite clusters indicate that only modest station-keeping maneuvers are required to maintain stable, "free-fall" trajectories within the orbital plane. ### Hardware Resilience in Space Environments * The project specifically tests Google’s Trillium (v6e) Cloud TPUs to determine if terrestrial AI accelerators can survive the radiation found in LEO. * Hardware is subjected to 67MeV proton beams to analyze the impact of Total Ionizing Dose (TID) and Single Event Effects (SEEs) on processing reliability. * Preliminary testing indicates promising results for the radiation tolerance of high-performance accelerators, suggesting that standard TPU architectures may be viable for orbital deployment with minimal modification. While still in the research and development phase, Project Suncatcher suggests that the future of massive AI scaling may involve shifting infrastructure away from terrestrial limits and toward modular, energy-rich orbital environments. Organizations should monitor the progress of free-space optical communication and radiation-hardened accelerators as these technologies will be the primary gatekeepers for space-based computation.

toss

Working as a Toss Place Sil (opens in new tab)

Toss Place implements a dual-role QA structure where managers are embedded directly within product Silos from the initial planning stages to final deployment. This shift moves QA from a final-stage bottleneck to a proactive partner that enhances delivery speed and stability through deep historical context and early risk mitigation. Consequently, the organization has transitioned to a culture where quality is viewed as a shared team responsibility rather than a siloed functional task. ### Integrating QA into Product Silos * QA managers belong to both a central functional team and specific product units (Silos) to ensure they are involved in the entire product lifecycle. * Participation begins at the OKR design phase, allowing QA to align testing strategies with specific product intentions and business goals. * Early involvement enables accurate risk assessment and scope estimation, preventing the "shallow testing" that often occurs when QA only sees the final product. ### Optimizing Spec Reviews and Sanity Testing * The team introduced a structured flow consisting of Spec Reviews followed by Q&A sessions to reduce repetitive discussions and information gaps. * All specification changes are centralized in shared design tools (such as Deus) or messenger threads to ensure transparency across all roles. * "Sanity Test" criteria were established where developers and QA agree on "Happy Case" validations and minimum spec requirements before development begins, ensuring everyone starts from the same baseline. ### Collaborative Live Monitoring * Post-release checklists were developed to involve the entire Silo in live monitoring, overcoming the limitations of having a single QA manager per unit. * This collaborative approach encourages non-technical roles to interact with the live product, reinforcing the culture that quality is a collective team responsibility. ### Streamlining Issue Tracking and Communication * The team implemented a "Send to Notion" workflow to instantly capture messenger-based feedback and ideas into a structured, prioritized backlog. * To reduce communication fragmentation, they transitioned from Jira to integrated Messenger Lists and Canvases, which allowed for centralized discussions and faster issue resolution. * Backlogs are prioritized based on user experience impact and release urgency, ensuring that critical bugs are addressed while minor improvements are tracked for future cycles. The success of these initiatives demonstrates that QA effectiveness is driven by integration and autonomy rather than rigid adherence to specific tools. To achieve both high velocity and high quality, organizations should empower QA professionals to act as product peers who can flexibly adapt their processes to the unique needs and data-driven goals of their specific product teams.

google

Accelerating the magic cycle of research breakthroughs and real-world applications (opens in new tab)

Google Research is accelerating a "magic cycle" where breakthrough scientific discoveries and real-world applications continuously reinforce one another through advanced AI models and open platforms. By leveraging agentic tools and large-scale foundations, the company is transforming complex data into actionable insights across geospatial analysis, genomics, and quantum computing. This iterative process aims to solve critical global challenges while simultaneously uncovering new frontiers for future innovation. ### Earth AI and Geospatial Reasoning * Google has integrated various geospatial models—including those for flood forecasting, wildfire tracking, and air quality—into a unified Earth AI program. * The newly introduced Geospatial Reasoning Agent uses Large Language Models (LLMs) to allow non-experts to ask complex questions and receive plain-language answers derived from diverse datasets. * Riverine flood models have been significantly expanded, now providing forecasts for over 2 billion people across 150 countries. * New Remote Sensing and Population Dynamics Foundations have been released to help researchers understand nuanced correlations in planetary data and supply chain management. ### DeepSomatic and Genomic Research * Building on ten years of genomics work, DeepSomatic is an AI tool designed to identify somatic mutations (genetic variants in tumors) to assist in cancer research. * The tool follows the development of previous foundational models like DeepVariant and DeepConsensus, which helped map human and non-human genomes. * These advancements aim to move the medical field closer to precision medicine by providing health practitioners with higher-resolution data on genetic variations. ### The Magic Cycle of Research and Development * Google highlights "Quantum Echoes" as a key breakthrough in quantum computing, contributing to the broader goal of solving fundamental scientific problems through high-scale computation. * The acceleration of discovery is largely attributed to "agentic tools" that assist scientists in navigating massive datasets and uncovering new research opportunities. * The company emphasizes a collaborative approach, making foundation models available to trusted testers and partners like the WHO and various international research institutes. To maximize the impact of these breakthroughs, organizations should look toward integrating multimodal AI agents that can bridge the gap between specialized scientific data and practical decision-making. By utilizing open platforms and foundation models, the broader scientific community can translate high-level research into scalable solutions for climate resilience, healthcare, and global policy.

toss

Toss People: Designing a (opens in new tab)

Data architecture is evolving from a reactive "cleanup" task into a proactive, end-to-end design process that ensures high data quality from the moment of creation. In fast-paced platform environments, the role of a Data Architect is to bridge the gap between rapid product development and reliable data structures, ultimately creating a foundation that both humans and AI can interpret accurately. By shifting from mere post-processing to foundational governance, organizations can maintain technical agility without sacrificing the integrity of their data assets. **From Post-Processing to End-to-End Governance** * Traditional data management often involves "fixing" or "matching puzzles" at the end of the pipeline after a service has already changed, leading to perpetual technical debt. * Effective data architecture requires a culture where data is treated as a primary design object from its inception, rather than a byproduct of application development. * The transition to an end-to-end governance model ensures that data quality is maintained throughout its entire lifecycle—from initial generation in production systems to final analysis and consumption. **Machine-Understandable Data and Ontologies** * Modern data design must move beyond human-readable metadata to structures that AI can autonomously process and understand. * The implementation of semantic-based standard dictionaries and ontologies reduces the need for "inference" or guessing by either humans or machines. * By explicitly defining the relationships and conceptual meanings of columns and tables, organizations create a high-fidelity environment where AI can provide accurate, context-aware responses without interpretive errors. **Balancing Development Speed with Data Quality** * In high-growth environments, insisting on "perfect" design can hinder competitive speed; therefore, architects must find a middle ground that allows for future extensibility. * Practical strategies include designing for current needs while leaving "logical room" for anticipated changes, ensuring that future cleanup is minimally disruptive. * Instead of enforcing rigid rules, architects should design systems where following the standard is the "path of least resistance," making high-quality data entry easier for developers than the alternative. **The Role of the Modern Data Architect** * The role has shifted from a fixed, corporate function to a dynamic problem-solver who uses structural design to solve business bottlenecks. * A successful architect must act as a mediator, convincing stakeholders that investing in a 5% quality improvement (e.g., moving from 90 to 95 points) provides significant long-term ROI in decision-making and AI reliability. * Aspiring architects should focus on incremental structural improvements, as any data professional who cares about how data functions is already operating on the path to data architecture.

google

Toward provably private insights into AI use (opens in new tab)

Google Research has introduced Provably Private Insights (PPI), a framework designed to analyze generative AI usage patterns while providing mathematical guarantees of user privacy. By integrating Large Language Models (LLMs) with differential privacy and trusted execution environments (TEEs), the system enables developers to derive aggregate trends from unstructured data without exposing individual user content. This approach ensures that server-side processing remains limited to privacy-preserving computations that are fully auditable by external parties. ### The Role of LLMs in Structured Summarization The system employs "data expert" LLMs to transform unstructured generative AI data into actionable, structured insights. * The framework utilizes open-source Gemma 3 models to perform specific analysis tasks, such as classifying transcripts into topics or identifying user frustration levels. * This "structured summarization" occurs entirely within a TEE, ensuring that the model processes raw data in an environment inaccessible to human operators or external processes. * Developers can update LLM prompts frequently to answer new research questions without compromising the underlying privacy architecture. ### Confidential Federated Analytics (CFA) Infrastructure The PPI system is built upon Confidential Federated Analytics, a technique that isolates data through hardware-based security and cryptographic verification. * User devices encrypt data and define specific authorized processing steps before uploading it to the server. * A TEE-hosted key management service only releases decryption keys to processing steps that match public, open-source code signatures. * System integrity is verified using Rekor, a public, tamper-resistant transparency log that allows external parties to confirm that the code running in the TEE is exactly what was published. ### Anonymization via Differential Privacy Once the LLM extracts features from the data, the system applies differential privacy (DP) to ensure that the final output does not reveal information about any specific individual. * The extracted categories are aggregated into histograms, with DP noise added to the final counts to prevent the identification of single users. * Because the privacy guarantee is applied at the aggregation stage, the system remains secure even if a developer uses a prompt specifically designed to isolate a single user's data. * All aggregation algorithms are open-source and reproducibly buildable, allowing for end-to-end verifiability of the privacy claims. By open-sourcing the PPI stack through the Google Parfait project and deploying it in applications like Pixel Recorder, this framework establishes a new standard for transparent data analysis. Developers should look to integrate similar TEE-based federated analytics to balance the need for product insights with the necessity of provable, hardware-backed user privacy.

line

Code Quality Improvement Techniques Part 21 (opens in new tab)

Designing objects that require a specific initialization sequence often leads to fragile code and runtime exceptions. When a class demands that a method like `prepare()` be called before its primary functionality becomes available, it places the burden of safety on the consumer rather than the structure of the code itself. To improve reliability, developers should aim to create "unbreakable" interfaces where an instance is either ready for use upon creation or restricted by the type system from being used incorrectly. ### Problems with "Broken" Constructors * Classes that allow instantiation in an "unprepared" state rely on documentation or developer memory to avoid `IllegalStateException` errors. * When an object is passed across different layers of an application, it becomes difficult to track whether the required setup logic has been executed. * Relying on runtime checks to verify internal state increases the surface area for bugs that only appear during specific execution paths. ### Immediate Initialization and Factory Patterns * The most direct solution is to move initialization logic into the `init` block, allowing properties to be defined as read-only (`val`). * Because constructors have limitations—such as the inability to use `suspend` functions or handle complex side effects—a private constructor combined with a static factory method (e.g., `companion object` in Kotlin) is often preferred. * Using a factory method like `createInstance()` ensures that all necessary preparation logic is completed before a user ever receives the object instance. ### Lazy and Internal Preparation * If the initialization process is computationally expensive and might not be needed for every instance, "lazy" initialization can defer the cost until the first time a functional method is called. * In Kotlin, the `by lazy` delegate can be used to encapsulate preparation logic, ensuring it only runs once and remains thread-safe. * Alternatively, the class can handle preparation internally within its main methods, checking the initialization state automatically so the user does not have to manage it manually. ### Type-Safe State Transitions * For complex lifecycles, the type system can be used to enforce order by splitting the object into two distinct classes: one for the "unprepared" state and one for the "prepared" state. * The initial class contains only the `prepare()` method, which returns a new instance of the "Prepared" class upon completion. * This approach makes it a compile-time impossibility to call methods like `play()` on an object that hasn't been prepared, effectively eliminating a whole category of runtime errors. ### Recommendations When designing classes with internal states, prioritize structural safety by making it impossible to represent an invalid state. Use factory functions for complex setup logic and consider splitting classes into separate types if they have distinct "ready" and "not ready" phases to leverage the compiler for error prevention.

google

StreetReaderAI: Towards making street view accessible via context-aware multimodal AI (opens in new tab)

StreetReaderAI is a research prototype designed to make immersive street-level imagery accessible to the blind and low-vision community through multimodal AI. By integrating real-time scene analysis with context-aware geographic data, the system transforms visual mapping data into an interactive, audio-first experience. This framework allows users to virtually explore environments and plan routes with a level of detail and independence previously unavailable through traditional screen readers. ### Navigation and Spatial Awareness The system offers an immersive, first-person exploration interface that mimics the mechanics of accessible gaming. * Users navigate using keyboard shortcuts or voice commands, taking "virtual steps" forward or backward and panning their view in 360 degrees. * Real-time audio feedback provides cardinal and intercardinal directions, such as "Now facing North," to maintain spatial orientation. * Distance tracking informs the user how far they have traveled between panoramic images, while "teleport" features allow for quick jumps to specific addresses or landmarks. ### Context-Aware AI Describer At the core of the tool is a subsystem backed by Gemini that synthesizes visual and geographic data to generate descriptions. * The AI Describer combines the current field-of-view image with dynamic metadata about nearby roads, intersections, and points of interest. * Two distinct modes cater to different user needs: a "Default" mode focusing on pedestrian safety and navigation, and a "Tour Guide" mode that provides historical and architectural details. * The system utilizes Gemini to proactively predict and suggest follow-up questions relevant to the specific scene, such as details about crosswalks or building entrances. ### Interactive Dialogue and Session Memory StreetReaderAI utilizes the Multimodal Live API to facilitate real-time, natural language conversations about the environment. * The AI Chat agent maintains a large context window of approximately 1,048,576 tokens, allowing it to retain a "memory" of up to 4,000 previous images and interactions. * This memory allows users to ask retrospective spatial questions, such as "Where was that bus stop I just passed?", with the agent providing relative directions based on the user's current location. * By tracking every pan and movement, the agent can provide specific details about the environment that were captured in previous steps of the virtual walk. ### User Evaluation and Practical Application Testing with blind screen reader users confirmed the system's utility in practical, real-world scenarios. * Participants successfully used the prototype to evaluate potential walking routes, identifying critical environmental features like the presence of benches or shelters at bus stops. * The study highlighted the importance of multimodal inputs—combining image recognition with structured map data—to provide a more accurate and reliable description than image analysis alone could offer. While StreetReaderAI remains a proof-of-concept, it demonstrates that the integration of multimodal LLMs and spatial data can bridge significant accessibility gaps in digital mapping. Future implementation of these technologies could transform how visually impaired individuals interact with the world, turning static street imagery into a functional tool for independent mobility and exploration.

google

How we are building the personal health coach (opens in new tab)

Google is leveraging Gemini models to create a proactive, adaptive personal health coach designed to bridge the gap between fragmented health data and actionable wellness guidance. By integrating physiological metrics with behavioral science, the system provides tailored insights and sustainable habit-building plans through a sophisticated multi-agent AI architecture. This initiative, currently in public preview for Fitbit Premium users, represents a transition toward data-driven, expert-validated health coaching that evolves dynamically with an individual's progress. ## Architecting a Multi-Agent Health Coach The system utilizes a complex multi-agent framework to coordinate different specialized AI sub-agents, ensuring that health recommendations are holistic and contextually aware. * **Conversational Agent:** Manages multi-turn interactions, understands user intent, and orchestrates the other agents while gathering necessary context for response generation. * **Data Science Agent:** Employs code-generation capabilities to iteratively fetch, analyze, and summarize physiological time-series data, such as sleep patterns and workout intensity. * **Domain Expert Agent:** Analyzes user data through the lens of specific fields like fitness or nutrition to generate and adapt personalized plans based on changing user context. * **Numerical Reasoning:** The coach performs sophisticated reasoning on health metrics, comparing current data against personal baselines and population-level statistics using capabilities derived from PH-LLM research. ## Ensuring Reliability via the SHARP Framework To move beyond general-purpose AI capabilities, the system is grounded in established coaching frameworks and subjected to rigorous technical and clinical validation. * **SHARP Evaluation:** The model is continuously assessed across five dimensions: Safety, Helpfulness, Accuracy, Relevance, and Personalization. * **Human-in-the-Loop Validation:** The development process involved over 1 million human annotations and 100,000 hours of evaluation by specialists in fields such as cardiology, endocrinology, and behavioral science. * **Expert Oversight:** Google convened a Consumer Health Advisory Panel and collaborated with professional fitness coaches to ensure the AI's recommendations align with real-world professional standards. * **Scientific Grounding:** The coach utilizes novel methods to foster consensus in nuanced health areas, ensuring that wellness recommendations remain scientifically accurate through the use of scaled "autoraters." Eligible Fitbit Premium users on Android in the US can now opt into the public preview to provide feedback on these personalized insights. As the tool evolves through iterative design and user research, it aims to provide a seamless connection between raw health metrics and sustainable lifestyle changes.

netflix

Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix is evolving its recommendation systems by moving beyond simple behavior imitation toward generative recommenders that better align with true user preferences. While generative models like HSTU and OneRec effectively capture sequential user patterns, they often struggle to distinguish between habitual clicks and genuine satisfaction. To bridge this gap, Netflix developed Advantage-Weighted Supervised Fine-tuning (A-SFT), a post-training method that leverages noisy reward signals to refine model performance without the need for complex counterfactual data. ### The Shift to Generative Recommenders * Modern generative recommenders (GRs), such as HSTU and OneRec, utilize transformer architectures to treat recommendation as a sequential transduction task. * The models are typically trained using next-item prediction, where the system learns to imitate the chronological sequence of a user’s activities. * A significant drawback of this "behavior cloning" approach is that it captures external trends and noise rather than long-term user satisfaction, potentially recommending content the user finished but did not actually enjoy. ### Barriers to Reinforcement Learning in RecSys * Traditional post-training methods used in Large Language Models, such as Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO), require counterfactual feedback that is difficult to obtain in recommendation contexts. * Because user sequences span weeks or years, it is impractical to generate and test hypothetical, counterfactual experiences for real-time user validation. * Reward signals in recommendation systems are inherently noisy; for instance, high watch time might indicate interest, but it can also be a result of external circumstances, making it an unreliable metric for optimization. ### Advantage-Weighted Supervised Fine-tuning (A-SFT) * A-SFT is a hybrid approach that sits between offline reinforcement learning and standard supervised fine-tuning. * The algorithm incorporates an advantage function to weight training examples, allowing the model to prioritize actions that lead to higher rewards while filtering out noise from the reward model. * This method is specifically designed to handle high-variance reward signals, using them as directional guides rather than absolute truth, which prevents the model from over-exploiting inaccurate data. * Benchmarks against other representative methods show that A-SFT achieves superior alignment between the generative recommendation policy and the underlying reward model. For organizations managing large-scale recommendation engines, A-SFT offers a practical path to implementing post-training improvements. By focusing on advantage-weighted signals, developers can improve recommendation quality using existing implicit feedback—like watch time and clicks—without the infrastructure hurdles of online reinforcement learning.

google

Google Earth AI: Unlocking geospatial insights with foundation models and cross-modal reasoning (opens in new tab)

Google Earth AI introduces a framework of geospatial foundation models and reasoning agents designed to solve complex, planetary-scale challenges through cross-modal reasoning. By integrating Gemini-powered orchestrators with specialized imagery, population, and environmental models, the system deconstructs multifaceted queries into actionable multi-step plans. This approach enables a holistic understanding of real-world events, such as disaster response and disease forecasting, by grounding AI insights in diverse, grounded geospatial data. ## Geospatial Reasoning Agents * Utilizes Gemini models as intelligent orchestrators to manage complex queries that require data from multiple domains. * The agent deconstructs a high-level question—such as predicting hurricane landfalls and community vulnerability—into a sequence of smaller, executable tasks. * It executes these plans by autonomously calling specialized foundation models, querying vast datastores, and utilizing geospatial tools to fuse disparate data points into a single, cohesive answer. ## Remote Sensing and Imagery Foundations * Employs vision-language models and open-vocabulary object detection trained on a large corpus of high-resolution overhead imagery paired with text descriptions. * Enables "zero-shot" capabilities, allowing users to find specific objects like "flooded roads" or "building damage" using natural language without needing to retrain the model for specific classes. * Technical evaluations show a 16% average improvement on text-based image search tasks and more than double the baseline accuracy for detecting novel objects in a zero-shot setting. ## Population Dynamics and Mobility * Focuses on the interplay between people and places using globally-consistent embeddings across 17 countries. * Includes monthly updated embeddings that capture shifting human activity patterns, which are essential for time-sensitive forecasting. * Research conducted with the University of Oxford showed that incorporating these population embeddings into a Dengue fever forecasting model in Brazil improved the R² metric from 0.456 to 0.656 for long-range 12-month predictions. ## Environmental and Disaster Forecasting * Integrates established Google research into weather nowcasting, flood forecasting, and wildfire boundary mapping. * Provides the reasoning agent with the data necessary to evaluate environmental risks alongside population density and infrastructure imagery. * Aims to provide Search and Maps users with real-time, accurate alerts regarding natural disasters grounded in planetary-scale environmental data. Developers and enterprises looking to solve high-level geospatial problems can now express interest in accessing these capabilities through Google Earth and Google Cloud. By leveraging these foundation models, organizations can automate the analysis of satellite imagery and human mobility data to better prepare for environmental and social challenges.