fine-tuning

7 posts

google

Spotlight on innovation: Google-sponsored Data Science for Health Ideathon across Africa (opens in new tab)

Google Research, in partnership with several pan-African machine learning communities, recently concluded the Africa-wide Data Science for Health Ideathon to address regional medical challenges. By providing access to specialized open-source health models and technical mentorship, the initiative empowered local researchers to develop tailored solutions for issues ranging from maternal health to oncology. The event demonstrated that localized innovation, supported by high-performance AI foundations, can effectively bridge healthcare gaps in resource-constrained environments. ## Collaborative Framework and Objectives * The Ideathon was launched at the 2025 Deep Learning Indaba in Kigali, Rwanda, in collaboration with SisonkeBiotik, Ro’ya, and DS-I Africa. * The primary goal was to foster capacity building within the African AI community, moving beyond theoretical research toward the execution of practical healthcare tools. * Participants received hands-on training on Google’s specialized health models and were supported with Google Cloud Vertex AI compute credits and mentorship from global experts. * Submissions were evaluated based on their innovation, technical feasibility, and contextual relevance to African health systems. ## Technical Foundations and Google Health Models * Developers focused on a suite of open health AI models, including MedGemma for clinical reasoning, TxGemma for therapeutics, and MedSigLIP for medical vision-language tasks. * The competition utilized a two-phase journey: an initial "Idea Development" stage where teams defined clinical problems and outlined AI approaches, followed by a "Prototype & Pitch" phase. * Technical implementations frequently involved advanced techniques such as Retrieval-Augmented Generation (RAG) to ensure alignment with local medical protocols and WHO guidelines. * Fine-tuning methods, specifically Low-Rank Adaptation (LoRA), were utilized by teams to specialize large-scale models like MedGemma-27B-IT for niche datasets. ## Innovative Solutions for Regional Health * **Dawa Health:** This first-place winner developed an AI-powered cervical cancer screening tool that uses MedSigLIP to identify abnormalities in colposcopy images uploaded via WhatsApp, combined with Gemini RAG for clinical guidance. * **Solver (CerviScreen AI):** This team built a web application for automated cervical-cytology screening by fine-tuning MedGemma-27B-IT on the CRIC dataset to assist cytopathologists with annotated images. * **Mkunga:** A maternal health call center that adapts MedGemma and Gemini to provide advice in Swahili using Speech-to-Text (STT) and Text-to-Speech (TTS) technologies. * **HexAI (DermaDetect):** Recognized for the best proof-of-concept, this offline-first mobile app allows community health workers to triage skin conditions using on-device versions of MedSigLIP, specifically designed for low-connectivity areas. The success of the Ideathon underscores the importance of "local solutions for local priorities." By making sophisticated models like MedGemma and MedSigLIP openly available, the technical barrier to entry is lowered, allowing African developers to build high-impact, culturally and linguistically relevant medical tools. For organizations looking to implement AI in global health, this model of providing foundational tools and cloud resources to local experts remains a highly effective strategy for sustainable innovation.

aws

Amazon Bedrock adds reinforcement fine-tuning simplifying how developers build smarter, more accurate AI models | AWS News Blog (opens in new tab)

Amazon Bedrock has introduced reinforcement fine-tuning, a new model customization capability that allows developers to build more accurate and cost-effective AI models using feedback-driven training. By moving away from the requirement for massive labeled datasets in favor of reward signals, the platform enables average accuracy gains of 66% while automating the complex infrastructure typically associated with advanced machine learning. This approach allows organizations to optimize smaller, faster models for specific business needs without sacrificing performance or incurring the high costs of larger model variants. **Challenges of Traditional Model Customization** * Traditional fine-tuning often requires massive, high-quality labeled datasets and expensive human annotation, which can be a significant barrier for many organizations. * Developers previously had to choose between settle for generic "out-of-the-box" results or managing the high costs and complexity of large-scale infrastructure. * The high barrier to entry for advanced reinforcement learning techniques often required specialized ML expertise that many development teams lack. **Mechanics of Reinforcement Fine-Tuning** * The system uses an iterative feedback loop where models improve based on reward signals that judge the quality of responses against specific business requirements. * Reinforcement Learning with Verifiable Rewards (RLVR) utilizes rule-based graders to provide objective feedback for tasks such as mathematics or code generation. * Reinforcement Learning from AI Feedback (RLAIF) uses AI-driven evaluations to help models understand preference and quality without manual human intervention. * The workflow can be powered by existing API logs within Amazon Bedrock or by uploading training datasets, eliminating the need for complex infrastructure setup. **Performance and Security Advantages** * The technique achieves an average accuracy improvement of 66% over base models, enabling smaller models to perform at the level of much larger alternatives. * Current support includes the Amazon Nova 2 Lite model, which helps developers optimize for both speed and price-to-performance. * All training data and customization processes remain within the secure AWS environment, ensuring that proprietary data is protected and compliant with organizational security standards. Developers should consider reinforcement fine-tuning as a primary strategy for optimizing smaller models like Amazon Nova 2 Lite to achieve high-tier performance at a lower cost. This capability is particularly recommended for specialized tasks like reasoning and coding where objective reward functions can be used to rapidly iterate and improve model accuracy.

aws

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning | AWS News Blog (opens in new tab)

Amazon SageMaker AI has introduced a new serverless customization capability designed to accelerate the fine-tuning of popular models like Llama, DeepSeek, and Amazon Nova. By automating resource provisioning and providing an intuitive interface for advanced reinforcement learning techniques, this feature reduces the model customization lifecycle from months to days. This end-to-end workflow allows developers to focus on model performance rather than infrastructure management, from initial training through to final deployment. **Automated Infrastructure and Model Support** * The service provides a serverless environment where SageMaker AI automatically selects and provisions compute resources based on the specific model architecture and dataset size. * Supported models include a broad range of high-performance options such as Amazon Nova, DeepSeek, GPT-OSS, Meta Llama, and Qwen. * The feature is accessible directly through the Amazon SageMaker Studio interface, allowing users to manage their entire model catalog in one location. **Advanced Customization and Reinforcement Learning** * Users can choose from several fine-tuning techniques, including traditional Supervised Fine-Tuning (SFT) and more advanced methods. * The platform supports modern optimization techniques such as Direct Preference Optimization (DPO), Reinforcement Learning from Verifiable Rewards (RLVR), and Reinforcement Learning from AI Feedback (RLAIF). * To simplify the process, SageMaker AI provides recommended defaults for hyperparameters like batch size, learning rate, and epochs based on the selected tuning technique. **Experiment Tracking and Security** * The workflow introduces a serverless MLflow application, enabling seamless experiment tracking and performance monitoring without additional setup. * Advanced configuration options allow for fine-grained control over network encryption and storage volume encryption to ensure data security. * The "Continue customization" feature allows for iterative tuning, where users can adjust hyperparameters or apply different techniques to an existing customized model. **Evaluation and Deployment Flexibility** * Built-in evaluation tools allow developers to compare the performance of their customized models against the original base models to verify improvements. * Once a model is finalized, it can be deployed with a few clicks to either Amazon SageMaker or Amazon Bedrock. * A centralized "My Models" dashboard tracks all custom iterations, providing detailed logs and status updates for every training and evaluation job. This serverless approach is highly recommended for teams that need to adapt large language models to specific domains quickly without the operational overhead of managing GPU clusters. By utilizing the integrated evaluation and multi-platform deployment options, organizations can transition from experimentation to production-ready AI more efficiently.

google

Learn Your Way: Reimagining textbooks with generative AI (opens in new tab)

Google Research has introduced Learn Your Way, an AI-driven educational experiment that reimagines traditional textbooks as personalized, multimodal learning journeys. By leveraging the LearnLM family of models integrated into Gemini 2.5 Pro, the system transforms static source material into tailored content based on a student’s specific grade level and interests. Early efficacy studies demonstrate that this approach significantly enhances retention, with students scoring 11 percentage points higher than those using standard digital readers. ### Pedagogical Foundations and Dual Coding The research is built on the "dual coding theory," which suggests that forming mental connections between different representations of information strengthens conceptual understanding. * The system moves away from a "one-size-fits-all" model toward a student-driven experience where learners can choose and intermix formats. * Personalization is used as a tool to enhance situational interest and motivation by adapting content to specific student attributes. * The framework incorporates active learning through real-time quizzing and feedback to address knowledge gaps as they arise. ### The Personalization Pipeline The technical architecture begins with a layered pipeline that processes source material, such as a textbook PDF, to create a foundational text for all other formats. * The original material is first "re-leveled" to match the learner’s reported grade level while maintaining the integrity and scope of the curriculum. * Generic examples within the text are strategically replaced with personalized examples based on user interests, such as sports, music, or food. * This personalized base text serves as the primary input for generating all subsequent multimodal representations, ensuring consistency across formats. ### Multimodal Content Generation To produce a wide variety of educational assets, the system utilizes a combination of large language models and specialized AI agents. * **Agentic Workflows:** While tools like mind maps and timelines are generated directly by Gemini, complex assets like narrated slides use multi-step agentic workflows to ensure pedagogical effectiveness. * **Custom Visuals:** Because general-purpose image models often struggle with educational accuracy, the researchers fine-tuned a dedicated model specifically for generating educational illustrations. * **Diverse Representations:** The interface provides "immersive text" with embedded questions, audio lessons for auditory learning, and interactive slides that mimic recorded classroom sessions. ### Research Outcomes and Future Application The project’s effectiveness was validated through a study comparing the GenAI approach against standard digital reading materials. * Students using the personalized AI tools showed a significant improvement in retention test scores. * Beyond retention, the system aims to transform passive reading into an active, multimodal experience that follows established learning science principles. * The "Learn Your Way" experiment is currently available on Google Labs, providing a practical look at how adaptive, learner-centric materials might replace static textbooks in future K-12 and higher education settings.

google

How Google’s AI can help transform health professions education (opens in new tab)

To address a projected global deficit of 11 million healthcare workers by 2030, Google Research is exploring how generative AI can provide personalized, competency-based education for medical professionals. By combining qualitative user-centered design with quantitative benchmarking of the pedagogically fine-tuned LearnLM model, researchers have demonstrated that AI can effectively mimic the behaviors of high-quality human tutors. The studies conclude that specialized models, now integrated into Gemini 2.5 Pro, can significantly enhance clinical reasoning and adapt to the individual learning styles of medical students. ## Learner-Centered Design and Participatory Research * Researchers conducted interdisciplinary co-design workshops featuring medical students, clinicians, and AI researchers to identify specific educational needs. * The team developed a rapid prototype of an AI tutor designed to guide learners through clinical reasoning exercises anchored in synthetic clinical vignettes. * Qualitative feedback from medical residents and students highlighted a demand for "preceptor-like" behaviors, such as the ability to manage cognitive load, provide constructive feedback, and encourage active reflection. * Analysis revealed that learners specifically value AI tools that can identify and bridge individual knowledge gaps rather than providing generic information. ## Quantitative Benchmarking via LearnLM * The study utilized LearnLM, a version of Gemini fine-tuned specifically for educational pedagogy, and compared its performance against Gemini 1.5 Pro. * Evaluations were conducted using 50 synthetic scenarios covering a spectrum of medical education, ranging from preclinical topics like platelet activation to clinical subjects such as neonatal jaundice. * Medical students engaged in 290 role-playing conversations, which were then evaluated based on four primary metrics: overall experience, meeting learning needs, enjoyability, and understandability. * Physician educators performed blinded reviews of conversation transcripts to assess whether the AI adhered to medical education standards and core competencies. ## Pedagogical Performance and Expert Evaluation * LearnLM was consistently rated higher than the base model by both students and educators, with experts noting it behaved "more like a very good human tutor." * The fine-tuned model demonstrated a superior ability to maintain a conversation plan and use grounding materials to provide accurate, context-aware instruction. * Findings suggest that pedagogical fine-tuning is essential for AI to move beyond simple fact-delivery and toward true interactive tutoring. * These specialized learning capabilities have been transitioned from the research phase into Gemini 2.5 Pro to support broader educational applications. By integrating these specialized AI behaviors into medical training pipelines, institutions can provide scalable, individualized support to students. The transition of LearnLM’s pedagogical features into Gemini 2.5 Pro provides a practical framework for developers to create tools that not only provide medical information but actively foster the critical thinking skills required for clinical practice.

google

Beyond billion-parameter burdens: Unlocking data synthesis with a conditional generator (opens in new tab)

The CTCL (Data Synthesis with ConTrollability and CLustering) framework provides a lightweight alternative to the computationally expensive process of fine-tuning billion-parameter models for differentially private synthetic data generation. By utilizing a 140-million parameter generator and a universal topic model, the system achieves high-quality distribution matching while remaining accessible for resource-constrained applications. This approach allows for the generation of unlimited synthetic samples without incurring additional privacy costs, consistently outperforming existing API-based and large-scale baselines under strict privacy guarantees. ### Pre-training Universal Components The framework relies on two core components developed using large-scale public corpora, which can be reused across different private domains: * **CTCL-Topic:** A universal topic model derived from Wikipedia documents. It uses BERTopic to embed and cluster data into approximately 1,000 distinct topics, each represented by 10 descriptive keywords. * **CTCL-Generator:** A conditional language model based on the 140M-parameter BART-base architecture. It was pre-trained on 430 million description–document pairs from the SlimPajama dataset, with descriptions generated by Gemma-2-2B to ensure the model can generate text based on specific input conditions. ### Learning the Private Domain Once the universal components are established, the framework learns the specific characteristics of a private dataset through a two-step process: * **Differentially Private (DP) Histograms:** The system captures high-level distributional information by creating a DP-protected histogram that represents the percentage of each topic present in the private corpus. * **DP Fine-Tuning:** Each document in the private dataset is associated with its corresponding keywords from the CTCL-Topic model. The CTCL-Generator is then fine-tuned on these keyword-document pairs using differential privacy to ensure individual data points are protected. ### Controllable Data Generation The final stage involves producing the synthetic dataset by sampling from the fine-tuned generator: * **Proportional Sampling:** The system generates data by targeting the exact topic proportions found in the private domain histogram. * **Keyword Conditioning:** For each topic, the model uses the associated 10 keywords as input to prompt the DP fine-tuned generator to produce relevant documents. * **Post-Processing Efficiency:** Because the generator is already fine-tuned with DP, the framework can generate an unlimited number of synthetic samples without further privacy budget expenditure, a significant advantage over iterative selection algorithms. CTCL offers a highly scalable and efficient solution for organizations needing to synthesize private text data without the infrastructure requirements of massive LLMs. Its ability to maintain topic-wise distribution through keyword conditioning makes it an ideal choice for specialized domains where maintaining the statistical utility of the data is as critical as protecting user privacy.

google

Achieving 10,000x training data reduction with high-fidelity labels (opens in new tab)

Google Ads researchers have developed a scalable active learning curation process that reduces the volume of training data required for fine-tuning LLMs by up to four orders of magnitude. By iteratively identifying the most informative and diverse examples through clustering and expert review, the method achieves significantly higher human-model alignment than traditional large-scale crowdsourced datasets. This approach effectively addresses the high costs and complexities of classifying ambiguous content, such as unsafe ads, where high-fidelity data is scarce and concept drift is frequent. ### The Iterative Curation Process * **Initial Labeling:** The process begins with a zero- or few-shot model (LLM-0) that generates a large, typically imbalanced dataset of "positive" and "benign" labels. * **Clustering and Confusion Identification:** Separate clusters are created for each label set; overlapping clusters indicate areas where the model is confused. * **Expert Sampling:** Human experts review pairs of examples located near the decision boundary of these overlapping clusters, prioritizing those that cover a larger area of the search space to ensure diversity. * **Recursive Refinement:** Expert labels are split into fine-tuning and evaluation sets; the model is retrained and the process repeats until model-human alignment plateaus or matches internal expert agreement. ### Measuring Alignment via Cohen’s Kappa * **Metric Selection:** Because ad safety is often subjective, the researchers use Cohen’s Kappa instead of precision and recall to measure how well two independent annotators align beyond chance. * **Performance Benchmarks:** A Kappa value above 0.8 is considered exceptional, while 0.4 is the minimum for acceptability. * **Goal Alignment:** The curation process aims to move model performance toward the "ceiling" of internal human agreement (which measured between 0.78 and 0.81 in these experiments). ### Experimental Results and Efficiency * **Model Scaling:** Experiments involved fine-tuning Gemini Nano-1 (1.8B parameters) and Nano-2 (3.25B parameters) on tasks of varying complexity. * **Drastic Data Reduction:** The curated method reached performance plateaus using fewer than 500 expert-labeled examples, compared to a baseline of 100,000 crowdsourced labels. * **Quality Gains:** Despite using 10,000x less data, the curated models saw up to a 65% improvement in alignment with human experts over the crowdsourced baselines. * **Class Balancing:** The process naturally corrected for production imbalances, moving from <1% positive examples in raw traffic to ~40% in the final curated sets. This curation method is a highly effective strategy for organizations managing high-stakes classification tasks where "ground truth" is subjective or data curation is prohibitively expensive. By shifting focus from data quantity to the quality and diversity of examples at the decision boundary, developers can maintain high-performing models that adapt quickly to evolving safety policies.