vertex-ai

4 posts

google

Next generation medical image interpretation with MedGemma 1.5 and medical speech to text with MedASR (opens in new tab)

Google Research has introduced MedGemma 1.5 4B and MedASR, expanding its suite of open medical AI models to support more complex clinical workflows. These updates significantly enhance the interpretation of high-dimensional imaging and medical speech-to-text, providing a compute-efficient foundation for healthcare developers to build upon. By maintaining an open-access model available on Hugging Face and Vertex AI, Google aims to accelerate the integration of multimodal AI into real-world medical applications. ### Multimodal Advancements in MedGemma 1.5 The latest update to the MedGemma 4B model focuses on high-dimensional and longitudinal data, moving beyond simple 2D image interpretation. * **3D Medical Imaging:** The model now supports volumetric representations from CT scans and MRIs, as well as whole-slide histopathology imaging. * **Longitudinal Review:** New capabilities allow for the review of chest X-ray time series, helping clinicians track disease progression over time. * **Anatomical Localization:** Developers can use the model to identify and localize specific anatomical features within chest X-rays. * **Document Understanding:** Enhanced support for extracting structured data from complex medical lab reports and documents. * **Edge Capability:** The 4B parameter size is specifically designed to be small enough to run offline while remaining accurate enough for core medical reasoning tasks. ### Medical Speech-to-Text with MedASR MedASR is a specialized automated speech recognition (ASR) model designed to bridge the gap between clinical dialogue and digital documentation. * **Clinical Dictation:** The model is specifically fine-tuned for medical terminology and the unique nuances of clinical dictation. * **Integrated Reasoning:** MedASR is designed to pair seamlessly with MedGemma, allowing transcribed text to be immediately processed for advanced medical reasoning or summarization. * **Accessibility:** Like other HAI-DEF models, it is free for research and commercial use and hosted on both Hugging Face and Google Cloud’s Vertex AI. ### Performance Benchmarks and Community Impact Google is incentivizing innovation through improved performance metrics and community-driven challenges. * **Accuracy Gains:** Internal benchmarks show MedGemma 1.5 improved disease-related CT classification by 3% and MRI classification by 14% compared to the previous version. * **MedGemma Impact Challenge:** A Kaggle-hosted hackathon with $100,000 in prizes has been launched to encourage developers to find creative applications for these multimodal tools. * **Model Collection:** The update complements existing tools like the MedSigLIP image encoder and the larger MedGemma 27B model, which remains the preferred choice for complex, text-heavy medical applications. Developers and researchers are encouraged to utilize MedGemma 1.5 for tasks requiring efficient, offline multimodal processing, while leveraging MedASR to automate clinical documentation. By participating in the MedGemma Impact Challenge, the community can help define the next generation of AI-assisted medical diagnostics and workflows.

google

Google Research 2025: Bolder breakthroughs, bigger impact (opens in new tab)

Google Research in 2025 has shifted toward an accelerated "Magic Cycle" that rapidly translates foundational breakthroughs into real-world applications across science, society, and consumer products. By prioritizing model efficiency, factuality, and agentic capabilities, the organization is moving beyond static text generation toward interactive, multi-modal systems that solve complex global challenges. This evolution is underpinned by a commitment to responsible AI development, ensuring that new technologies like quantum computing and generative UI are both safe and culturally inclusive. ## Enhancing Model Efficiency and Factuality * Google introduced new efficiency-focused techniques like block verification (an evolution of speculative decoding) and the LAVA scheduling algorithm, which optimizes resource allocation in large cloud data centers. * The Gemini 3 model achieved state-of-the-art results on factuality benchmarks, including SimpleQA Verified and the newly released FACTS benchmark suite, by emphasizing grounded world knowledge. * Research into Retrieval Augmented Generation (RAG) led to the development of the LLM Re-Ranker in Vertex AI, which helps models determine if they possess sufficient context to provide accurate answers. * The Gemma open model expanded to support over 140 languages, supported by the TUNA taxonomy and the Amplify initiative to improve socio-cultural intelligence and data representation. ## Interactive Experiences through Generative UI * A novel implementation of generative UI allows Gemini 3 to dynamically create visual interfaces, web pages, and tools in response to user prompts rather than providing static text. * This technology is powered by specialized models like "Gemini 3-interactive," which are trained to output structured code and design elements. * These capabilities have been integrated into AI Mode within Google Search, allowing for more immersive and customizable user journeys. ## Advanced Architectures and Agentic AI * Google is exploring hybrid model architectures, such as Jamba-style models that combine State Space Models (SSMs) with traditional attention mechanisms to handle long contexts more efficiently. * The development of agentic AI focuses on models that can reason, plan, and use tools, exemplified by Project Astra, a prototype for a universal AI agent. * Specialized models like Gemini 3-code have been optimized to act as autonomous collaborators for software developers, assisting in complex coding tasks and system design. ## AI for Science and Planetary Health * In biology, research teams utilized AI to map human heart and brain structures and employed RoseTTAFold-Diffusion to design new proteins for therapeutic use. * The NeuralGCM model has revolutionized Earth sciences by combining traditional physics with machine learning for faster, more accurate weather and climate forecasting. * Environmental initiatives include the FireSat satellite constellation for global wildfire detection and the expansion of AI-driven flood forecasting and contrail mitigation. ## Quantum Computing and Responsible AI * Google achieved significant milestones in quantum error correction, developing low-overhead codes that bring the industry closer to a reliable, large-scale quantum computer. * Security and safety remain central, with the expansion of SynthID—a watermarking tool for AI-generated text, audio, and video—to help users identify synthetic content. * The team continues to refine the Secure AI Framework (SAIF) to defend against emerging threats while promoting the safe deployment of generative media models like Veo and Imagen. To maximize the impact of these advancements, organizations should focus on integrating agentic workflows and RAG-based architectures to ensure their AI implementations are both factual and capable of performing multi-step tasks. Developers can leverage the Gemma open models to build culturally aware applications that scale across diverse global markets.

google

Spotlight on innovation: Google-sponsored Data Science for Health Ideathon across Africa (opens in new tab)

Google Research, in partnership with several pan-African machine learning communities, recently concluded the Africa-wide Data Science for Health Ideathon to address regional medical challenges. By providing access to specialized open-source health models and technical mentorship, the initiative empowered local researchers to develop tailored solutions for issues ranging from maternal health to oncology. The event demonstrated that localized innovation, supported by high-performance AI foundations, can effectively bridge healthcare gaps in resource-constrained environments. ## Collaborative Framework and Objectives * The Ideathon was launched at the 2025 Deep Learning Indaba in Kigali, Rwanda, in collaboration with SisonkeBiotik, Ro’ya, and DS-I Africa. * The primary goal was to foster capacity building within the African AI community, moving beyond theoretical research toward the execution of practical healthcare tools. * Participants received hands-on training on Google’s specialized health models and were supported with Google Cloud Vertex AI compute credits and mentorship from global experts. * Submissions were evaluated based on their innovation, technical feasibility, and contextual relevance to African health systems. ## Technical Foundations and Google Health Models * Developers focused on a suite of open health AI models, including MedGemma for clinical reasoning, TxGemma for therapeutics, and MedSigLIP for medical vision-language tasks. * The competition utilized a two-phase journey: an initial "Idea Development" stage where teams defined clinical problems and outlined AI approaches, followed by a "Prototype & Pitch" phase. * Technical implementations frequently involved advanced techniques such as Retrieval-Augmented Generation (RAG) to ensure alignment with local medical protocols and WHO guidelines. * Fine-tuning methods, specifically Low-Rank Adaptation (LoRA), were utilized by teams to specialize large-scale models like MedGemma-27B-IT for niche datasets. ## Innovative Solutions for Regional Health * **Dawa Health:** This first-place winner developed an AI-powered cervical cancer screening tool that uses MedSigLIP to identify abnormalities in colposcopy images uploaded via WhatsApp, combined with Gemini RAG for clinical guidance. * **Solver (CerviScreen AI):** This team built a web application for automated cervical-cytology screening by fine-tuning MedGemma-27B-IT on the CRIC dataset to assist cytopathologists with annotated images. * **Mkunga:** A maternal health call center that adapts MedGemma and Gemini to provide advice in Swahili using Speech-to-Text (STT) and Text-to-Speech (TTS) technologies. * **HexAI (DermaDetect):** Recognized for the best proof-of-concept, this offline-first mobile app allows community health workers to triage skin conditions using on-device versions of MedSigLIP, specifically designed for low-connectivity areas. The success of the Ideathon underscores the importance of "local solutions for local priorities." By making sophisticated models like MedGemma and MedSigLIP openly available, the technical barrier to entry is lowered, allowing African developers to build high-impact, culturally and linguistically relevant medical tools. For organizations looking to implement AI in global health, this model of providing foundational tools and cloud resources to local experts remains a highly effective strategy for sustainable innovation.

google

Deeper insights into retrieval augmented generation: The role of sufficient context (opens in new tab)

Google Research has introduced "sufficient context" as a critical new metric for evaluating Retrieval Augmented Generation (RAG) systems, arguing that simple relevance is an inadequate measure of performance. By focusing on whether a retrieved context contains all the necessary information to definitively answer a query, researchers developed an LLM-based autorater that classifies context sufficiency with 93% accuracy. This framework reveals that many RAG failures, specifically hallucinations, occur because models fail to abstain from answering when information is incomplete or contradictory. ## Defining and Measuring Sufficient Context * Sufficient context is defined as containing all information necessary to provide a definitive answer, while insufficient context is relevant but incomplete, inconclusive, or contradictory. * The researchers developed an "autorater" using Gemini 1.5 Pro, utilizing chain-of-thought prompting and 1-shot examples to evaluate query-context pairs. * In benchmarks against human expert "gold standard" labels, the autorater achieved 93% accuracy, outperforming specialized models like FLAMe (fine-tuned PaLM 24B) and NLI-based methods. * Unlike traditional metrics, this approach does not require ground-truth answers to evaluate the quality of the retrieved information. ## RAG Failure Modes and Abstention Challenges * State-of-the-art models (Gemini, GPT, Claude) perform exceptionally well when provided with sufficient context but struggle when context is lacking. * The primary driver of hallucinations in RAG systems is the "abstention" problem, where a model attempts to answer a query based on insufficient context rather than stating "I don't know." * Analyzing model responses through the lens of sufficiency allows developers to distinguish between "knowledge" (the model knows the answer internally) and "grounding" (the model correctly uses the provided context). ## Implementation in Vertex AI * The insights from this research have been integrated into the Vertex AI RAG Engine via a new LLM Re-Ranker feature. * The re-ranker prioritizes retrieved snippets based on their likelihood of providing a sufficient answer, significantly improving retrieval metrics such as normalized Discounted Cumulative Gain (nDCG). * By filtering for sufficiency during the retrieval phase, the system reduces the likelihood that the LLM will be forced to process misleading or incomplete data. To minimize hallucinations and improve the reliability of RAG applications, developers should move beyond keyword-based relevance and implement re-ranking stages that specifically evaluate context sufficiency. Ensuring that an LLM has the "right" to answer based on the provided data—and training it to abstain when that data is missing—is essential for building production-grade generative AI tools.