multi-agent-systems

3 posts

google

How we are building the personal health coach (opens in new tab)

Google is leveraging Gemini models to create a proactive, adaptive personal health coach designed to bridge the gap between fragmented health data and actionable wellness guidance. By integrating physiological metrics with behavioral science, the system provides tailored insights and sustainable habit-building plans through a sophisticated multi-agent AI architecture. This initiative, currently in public preview for Fitbit Premium users, represents a transition toward data-driven, expert-validated health coaching that evolves dynamically with an individual's progress. ## Architecting a Multi-Agent Health Coach The system utilizes a complex multi-agent framework to coordinate different specialized AI sub-agents, ensuring that health recommendations are holistic and contextually aware. * **Conversational Agent:** Manages multi-turn interactions, understands user intent, and orchestrates the other agents while gathering necessary context for response generation. * **Data Science Agent:** Employs code-generation capabilities to iteratively fetch, analyze, and summarize physiological time-series data, such as sleep patterns and workout intensity. * **Domain Expert Agent:** Analyzes user data through the lens of specific fields like fitness or nutrition to generate and adapt personalized plans based on changing user context. * **Numerical Reasoning:** The coach performs sophisticated reasoning on health metrics, comparing current data against personal baselines and population-level statistics using capabilities derived from PH-LLM research. ## Ensuring Reliability via the SHARP Framework To move beyond general-purpose AI capabilities, the system is grounded in established coaching frameworks and subjected to rigorous technical and clinical validation. * **SHARP Evaluation:** The model is continuously assessed across five dimensions: Safety, Helpfulness, Accuracy, Relevance, and Personalization. * **Human-in-the-Loop Validation:** The development process involved over 1 million human annotations and 100,000 hours of evaluation by specialists in fields such as cardiology, endocrinology, and behavioral science. * **Expert Oversight:** Google convened a Consumer Health Advisory Panel and collaborated with professional fitness coaches to ensure the AI's recommendations align with real-world professional standards. * **Scientific Grounding:** The coach utilizes novel methods to foster consensus in nuanced health areas, ensuring that wellness recommendations remain scientifically accurate through the use of scaled "autoraters." Eligible Fitbit Premium users on Android in the US can now opt into the public preview to provide feedback on these personalized insights. As the tool evolves through iterative design and user research, it aims to provide a seamless connection between raw health metrics and sustainable lifestyle changes.

google

The anatomy of a personal health agent (opens in new tab)

Google researchers have developed the Personal Health Agent (PHA), an LLM-powered prototype designed to provide evidence-based, personalized health insights by analyzing multimodal data from wearables and blood biomarkers. By utilizing a specialized multi-agent architecture, the system deconstructs complex health queries into specific tasks to ensure statistical accuracy and clinical grounding. The study demonstrates that this modular approach significantly outperforms standard large language models in providing reliable, data-driven wellness support. ## Multi-Agent System Architecture * The PHA framework adopts a "team-based" approach, utilizing three specialist sub-agents: a Data Science agent, a Domain Expert agent, and a Health Coach. * The system was validated using a real-world dataset from 1,200 participants, featuring longitudinal Fitbit data, health questionnaires, and clinical blood test results. * This architecture was designed after a user-centered study of 1,300 health queries, identifying four key needs: general knowledge, data interpretation, wellness advice, and symptom assessment. * Evaluation involved over 1,100 hours of human expert effort across 10 benchmark tasks to ensure the system outperformed base models like Gemini. ## The Data Science Agent * This agent specializes in "contextualized numerical insights," transforming ambiguous queries (e.g., "How is my fitness trending?") into formal statistical analysis plans. * It operates through a two-stage process: first interpreting the user's intent and data sufficiency, then generating executable code to analyze time-series data. * In benchmark testing, the agent achieved a 75.6% score in analysis planning, significantly higher than the 53.7% score achieved by the base model. * The agent's code generation was validated against 173 rigorous unit tests written by human data scientists to ensure accuracy in handling wearable sensor data. ## The Domain Expert Agent * Designed for high-stakes medical accuracy, this agent functions as a grounded source of health knowledge using a multi-step reasoning framework. * It utilizes a "toolbox" approach, granting the LLM access to authoritative external databases such as the National Center for Biotechnology Information (NCBI) to provide verifiable facts. * The agent is specifically tuned to tailor information to the user’s unique profile, including specific biomarkers and pre-existing medical conditions. * Performance was measured through board certification and coaching exam questions, as well as its ability to provide accurate differential diagnoses compared to human clinicians. While currently a research framework rather than a public product, the PHA demonstrates that a modular, specialist-driven AI architecture is essential for safe and effective personal health management. Developers of future health-tech tools should prioritize grounding LLMs in external clinical databases and implementing rigorous statistical validation stages to move beyond the limitations of general-purpose chatbots.

google

Enabling physician-centered oversight for AMIE (opens in new tab)

Guardrailed-AMIE (g-AMIE) is a diagnostic AI framework designed to perform patient history-taking while strictly adhering to safety guardrails that prevent it from providing direct medical advice. By decoupling data collection from clinical decision-making, the system enables an asynchronous oversight model where primary care physicians (PCPs) review and finalize AI-generated medical summaries. In virtual clinical trials, g-AMIE’s diagnostic outputs and patient communications were preferred by overseeing physicians and patient actors over human-led control groups. ## Multi-Agent Architecture and Guardrails * The system utilizes a multi-agent setup powered by Gemini 2.0 Flash, consisting of a dialogue agent, a guardrail agent, and a SOAP note agent. * The dialogue agent conducts history-taking in three distinct phases: general information gathering, targeted validation of a differential diagnosis, and a conclusion phase for patient questions. * A dedicated guardrail agent monitors and rephrases responses in real-time to ensure the AI abstains from sharing individualized diagnoses or treatment plans directly with the patient. * The SOAP note agent employs sequential multi-step generation to separate summarization tasks (Subjective and Objective) from more complex inferential tasks (Assessment and Plan). ## The Clinician Cockpit and Asynchronous Oversight * To facilitate human review, researchers developed the "clinician cockpit," a web interface co-designed with outpatient physicians through semi-structured interviews. * The interface is structured around the standard SOAP note format, presenting the patient’s perspective, measurable data, differential diagnosis, and proposed management strategy. * This framework allows overseeing PCPs to review cases asynchronously, editing the AI’s proposed differential diagnoses and management plans before sharing a final message with the patient. * The separation of history-taking from decision-making ensures that licensed medical professionals retain ultimate accountability for patient care. ## Performance Evaluation via Virtual OSCE * The system was evaluated in a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) involving 60 case scenarios. * g-AMIE’s performance was compared against primary care physicians, nurse practitioners, and physician assistants who were required to operate under the same restrictive guardrails. * Overseeing PCPs and independent physician raters preferred g-AMIE’s diagnostic accuracy and management plans over those of the human control groups. * Patient actors reported a preference for the messages generated by g-AMIE compared to those drafted by human clinicians in the study. While g-AMIE demonstrates high potential for human-AI collaboration in diagnostics, the researchers emphasize that results should be interpreted with caution. The workflow was specifically optimized for AI characteristics, and human clinicians may require specialized training to perform effectively within such highly regulated guardrail frameworks.