medgemma

3 posts

google

Next generation medical image interpretation with MedGemma 1.5 and medical speech to text with MedASR (opens in new tab)

Google Research has introduced MedGemma 1.5 4B and MedASR, expanding its suite of open medical AI models to support more complex clinical workflows. These updates significantly enhance the interpretation of high-dimensional imaging and medical speech-to-text, providing a compute-efficient foundation for healthcare developers to build upon. By maintaining an open-access model available on Hugging Face and Vertex AI, Google aims to accelerate the integration of multimodal AI into real-world medical applications. ### Multimodal Advancements in MedGemma 1.5 The latest update to the MedGemma 4B model focuses on high-dimensional and longitudinal data, moving beyond simple 2D image interpretation. * **3D Medical Imaging:** The model now supports volumetric representations from CT scans and MRIs, as well as whole-slide histopathology imaging. * **Longitudinal Review:** New capabilities allow for the review of chest X-ray time series, helping clinicians track disease progression over time. * **Anatomical Localization:** Developers can use the model to identify and localize specific anatomical features within chest X-rays. * **Document Understanding:** Enhanced support for extracting structured data from complex medical lab reports and documents. * **Edge Capability:** The 4B parameter size is specifically designed to be small enough to run offline while remaining accurate enough for core medical reasoning tasks. ### Medical Speech-to-Text with MedASR MedASR is a specialized automated speech recognition (ASR) model designed to bridge the gap between clinical dialogue and digital documentation. * **Clinical Dictation:** The model is specifically fine-tuned for medical terminology and the unique nuances of clinical dictation. * **Integrated Reasoning:** MedASR is designed to pair seamlessly with MedGemma, allowing transcribed text to be immediately processed for advanced medical reasoning or summarization. * **Accessibility:** Like other HAI-DEF models, it is free for research and commercial use and hosted on both Hugging Face and Google Cloud’s Vertex AI. ### Performance Benchmarks and Community Impact Google is incentivizing innovation through improved performance metrics and community-driven challenges. * **Accuracy Gains:** Internal benchmarks show MedGemma 1.5 improved disease-related CT classification by 3% and MRI classification by 14% compared to the previous version. * **MedGemma Impact Challenge:** A Kaggle-hosted hackathon with $100,000 in prizes has been launched to encourage developers to find creative applications for these multimodal tools. * **Model Collection:** The update complements existing tools like the MedSigLIP image encoder and the larger MedGemma 27B model, which remains the preferred choice for complex, text-heavy medical applications. Developers and researchers are encouraged to utilize MedGemma 1.5 for tasks requiring efficient, offline multimodal processing, while leveraging MedASR to automate clinical documentation. By participating in the MedGemma Impact Challenge, the community can help define the next generation of AI-assisted medical diagnostics and workflows.

google

Spotlight on innovation: Google-sponsored Data Science for Health Ideathon across Africa (opens in new tab)

Google Research, in partnership with several pan-African machine learning communities, recently concluded the Africa-wide Data Science for Health Ideathon to address regional medical challenges. By providing access to specialized open-source health models and technical mentorship, the initiative empowered local researchers to develop tailored solutions for issues ranging from maternal health to oncology. The event demonstrated that localized innovation, supported by high-performance AI foundations, can effectively bridge healthcare gaps in resource-constrained environments. ## Collaborative Framework and Objectives * The Ideathon was launched at the 2025 Deep Learning Indaba in Kigali, Rwanda, in collaboration with SisonkeBiotik, Ro’ya, and DS-I Africa. * The primary goal was to foster capacity building within the African AI community, moving beyond theoretical research toward the execution of practical healthcare tools. * Participants received hands-on training on Google’s specialized health models and were supported with Google Cloud Vertex AI compute credits and mentorship from global experts. * Submissions were evaluated based on their innovation, technical feasibility, and contextual relevance to African health systems. ## Technical Foundations and Google Health Models * Developers focused on a suite of open health AI models, including MedGemma for clinical reasoning, TxGemma for therapeutics, and MedSigLIP for medical vision-language tasks. * The competition utilized a two-phase journey: an initial "Idea Development" stage where teams defined clinical problems and outlined AI approaches, followed by a "Prototype & Pitch" phase. * Technical implementations frequently involved advanced techniques such as Retrieval-Augmented Generation (RAG) to ensure alignment with local medical protocols and WHO guidelines. * Fine-tuning methods, specifically Low-Rank Adaptation (LoRA), were utilized by teams to specialize large-scale models like MedGemma-27B-IT for niche datasets. ## Innovative Solutions for Regional Health * **Dawa Health:** This first-place winner developed an AI-powered cervical cancer screening tool that uses MedSigLIP to identify abnormalities in colposcopy images uploaded via WhatsApp, combined with Gemini RAG for clinical guidance. * **Solver (CerviScreen AI):** This team built a web application for automated cervical-cytology screening by fine-tuning MedGemma-27B-IT on the CRIC dataset to assist cytopathologists with annotated images. * **Mkunga:** A maternal health call center that adapts MedGemma and Gemini to provide advice in Swahili using Speech-to-Text (STT) and Text-to-Speech (TTS) technologies. * **HexAI (DermaDetect):** Recognized for the best proof-of-concept, this offline-first mobile app allows community health workers to triage skin conditions using on-device versions of MedSigLIP, specifically designed for low-connectivity areas. The success of the Ideathon underscores the importance of "local solutions for local priorities." By making sophisticated models like MedGemma and MedSigLIP openly available, the technical barrier to entry is lowered, allowing African developers to build high-impact, culturally and linguistically relevant medical tools. For organizations looking to implement AI in global health, this model of providing foundational tools and cloud resources to local experts remains a highly effective strategy for sustainable innovation.

google

AfriMed-QA: Benchmarking large language models for global health (opens in new tab)

AfriMed-QA is a comprehensive benchmarking suite designed to address the critical gap in medical LLM evaluation for African healthcare contexts. Developed through a partnership between Google Research and a pan-African consortium, the project demonstrates that current models often struggle with geographic distribution shifts in disease and localized linguistic nuances. The researchers conclude that diverse, region-specific datasets are essential for training equitable AI tools that can safely provide clinical decision support in low-resource settings. ## Limitations of Western-Centric Benchmarks * Existing medical benchmarks like USMLE MedQA focus on Western clinical contexts, which may not generalize to other regions. * Models trained on traditional datasets often fail to account for specific distribution shifts in disease types and cultural symptom descriptions. * The lack of diverse data makes it difficult to assess how LLMs handle variations in language and linguistics, even when the primary language is English. ## The AfriMed-QA Dataset Composition * The dataset contains approximately 15,000 clinically diverse questions and answers sourced from 16 African countries. * It covers 32 medical specialties, ranging from neurosurgery and internal medicine to infectious diseases and obstetrics. * The content is divided into three distinct formats: 4,000+ expert multiple-choice questions (MCQs), 1,200 open-ended short-answer questions (SAQs), and 10,000 consumer-style queries. * Data was crowdsourced from 621 contributors across 60 medical schools to ensure a broad representation of the continent's medical landscape. ## Data Collection and Curation Methodology * Researchers adapted a specialized web-based platform, originally built by Intron Health, to facilitate large-scale crowdsourcing across different regions. * To protect privacy, consumer queries were generated by prompting users with specific disease scenarios rather than asking for personal health information. * The curation process included custom user interfaces for quality reviews and blinded human evaluations by clinical experts to ensure the accuracy of reference answers. ## LLM Performance and Evaluation Results * The study benchmarked 30 general and biomedical LLMs, evaluating them for accuracy, semantic similarity, and human preference. * A significant performance gap exists between model sizes; larger models consistently outperformed smaller models on the AfriMed-QA benchmark. * This trend highlights a challenge for low-resource settings, where smaller, specialized models are often preferred for on-device or edge deployment due to infrastructure constraints. * The dataset has already been utilized to improve Google’s MedGemma, demonstrating its utility in training multimodal medical models. The AfriMed-QA benchmark datasets and evaluation code have been open-sourced on Hugging Face and GitHub to support the global research community. Developers are encouraged to use these tools to build and refine medical AI that is more inclusive and effective for the Global South.