Key takeaways AI chat lets you have open-ended, back-and-forth conversations with an AI system that can respond to follow-up questions and evolving requests. Unlike traditional chatbots, AI chat generates responses dynamically rather than relying on fixed scripts or decision tre…
Key takeaways A chatbot is a software program that simulates conversation with users through text or voice. Chatbots can answer questions, provide information, guide users through tasks, automate routine interactions, and more. There are several types of chatbots, including rule…
Exploring the feasibility of conversational diagnostic AI in a real-world clinical study March 11, 2026 Mike Schaekermann, Research Lead, Google Research, and Alan Karthikesalingam, Director/Principal Scientist, Google DeepMind We present insights from a first-of-its-kind resear…
Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations February 10, 2026 Erzhen Hu, Student Researcher, and Ruofei Du, Interactive Perception & Graphics Lead, Google XR DialogLab is a research prototype that provides a unified interface to con…
Collaborating on a nationwide randomized study of AI in real-world virtual care February 3, 2026 Mike Schaekermann and Cameron Chen, Research Leads In partnership with Included Health, we will be launching a first-of-its-kind nationwide study to evaluate conversational AI within…
A practical blueprint for evaluating conversational AI at scale LLM applications present a deceptively simple interface: a single text box. But behind that minimalism runs a chain of probabilistic stages, including intent classification, document retrieval, ranking, prompt const…
Google Research has developed "Wayfinding AI," a research prototype based on Gemini designed to transform health information seeking from a passive query-response model into a proactive, context-seeking dialogue. By prioritizing clarifying questions and iterative guidance, the agent addresses the common struggle users face when attempting to articulate complex or ambiguous medical concerns. User studies indicate that this proactive approach results in health information that participants find significantly more helpful, relevant, and tailored to their specific needs than traditional AI responses.
### Challenges in Digital Health Navigation
* Formative research involving 33 participants highlighted that users often struggle to articulate health concerns because they lack the clinical background to know which details are medically relevant.
* The study found that users typically "throw words" at a search engine and sift through generic, impersonal results that do not account for their unique context.
* Initial UX testing revealed a strong user preference for a "deferred-answer" approach, where the AI mimics a medical professional by asking clarifying questions before jumping to a conclusion.
### Core Design Principles of Wayfinding AI
* **Proactive Conversational Guidance:** At every turn, the agent asks up to three targeted questions to reduce ambiguity and help users systematically share their "health story."
* **Best-Effort Answers:** To ensure immediate utility, the AI provides the best possible information based on the data available at that moment, while noting that the answer will improve as the user provides more context.
* **Transparent Reasoning:** The system explicitly explains how the user’s most recent answers have helped refine the previous response, making the AI’s internal logic understandable.
### Split-Stream User Interface
* To prevent clarifying questions from being buried in long paragraphs, the prototype uses a two-column layout.
* The left column is dedicated to the interactive chat and specific follow-up questions to keep the user focused on the dialogue.
* The right column displays the "best information so far" and detailed explanations, allowing users to dive into the technical content only when they feel enough context has been established.
### Comparative Evaluation and Performance
* A randomized study with 130 participants compared the Wayfinding AI against a baseline Gemini 2.5 Flash model.
* Participants interacted with both models for at least three minutes regarding a personal health question and rated them across six dimensions: helpfulness, question relevance, tailoring, goal understanding, ease of use, and efficiency.
* The proactive agent outperformed the baseline significantly, with participants reporting that the context-seeking behavior felt more professional and increased their confidence in the AI's suggestions.
The research suggests that for sensitive and complex topics like health, AI should move beyond being a passive knowledge base. By adopting a "wayfinding" strategy that guides users through their own information needs, AI agents can provide more personalized and empowering experiences that better mirror expert human consultation.
Google Research has introduced REGEN, a benchmark dataset designed to evolve recommender systems from simple item predictors into conversational agents capable of natural language interaction. By augmenting the Amazon Product Reviews dataset with synthetic critiques and narratives using Gemini 1.5 Flash, the researchers provide a framework for training models to understand user feedback and explain their suggestions. The study demonstrates that integrating natural language critiques significantly improves recommendation accuracy while enabling models to generate personalized, context-aware content.
### Composition of the REGEN Dataset
* The dataset enriches the existing Amazon Product Reviews archive by adding synthetic conversational elements, specifically targeting the gap in datasets that support natural language feedback.
* **Critiques** are generated for similar item pairs within hierarchical categories, allowing users to guide the system by requesting specific changes, such as a different color or increased storage.
* **Narratives** provide contextual depth through purchase reasons, product endorsements, and concise user summaries, helping the system justify its recommendations to the end-user.
### Unified Generative Modeling Approaches
* The researchers framed a "jointly generative" task where models must process a purchase history and optional critique to output both a recommended item ID and a supporting narrative.
* The **FLARE (Hybrid)** architecture uses a sequential recommender for item prediction based on collaborative filtering, which then feeds into a Gemma 2B LLM to generate the final text narrative.
* The **LUMEN (Unified)** model functions as an end-to-end system where item IDs and text tokens are integrated into a single vocabulary, allowing one LLM to handle critiques, recommendations, and narratives simultaneously.
### Performance and Impact of User Feedback
* Incorporating natural language critiques consistently improved recommendation metrics across different architectures, demonstrating that language-guided refinement is a powerful tool for accuracy.
* In the Office domain, the FLARE hybrid model's Recall@10—a measure of how often the desired item appears in the top 10 results—increased from 0.124 to 0.1402 when critiques were included.
* Results indicate that models trained on REGEN can achieve performance comparable to state-of-the-art specialized recommenders while maintaining high-quality natural language generation.
The REGEN dataset and the accompanying LUMEN architecture provide a path forward for building more transparent and interactive AI assistants. For developers and researchers, utilizing these conversational benchmarks is essential for moving beyond "black box" recommendations toward systems that can explain their logic and adapt to specific user preferences in real time.
Double click: When coding becomes conversation Developers are embracing a new way of building software that’s more conversation than code. But is it more mayhem than magic? Insights Engineering AI UI/UX Thought leadership Design