agentic-workflows

2 posts

google

Learn Your Way: Reimagining textbooks with generative AI (opens in new tab)

Google Research has introduced Learn Your Way, an AI-driven educational experiment that reimagines traditional textbooks as personalized, multimodal learning journeys. By leveraging the LearnLM family of models integrated into Gemini 2.5 Pro, the system transforms static source material into tailored content based on a student’s specific grade level and interests. Early efficacy studies demonstrate that this approach significantly enhances retention, with students scoring 11 percentage points higher than those using standard digital readers. ### Pedagogical Foundations and Dual Coding The research is built on the "dual coding theory," which suggests that forming mental connections between different representations of information strengthens conceptual understanding. * The system moves away from a "one-size-fits-all" model toward a student-driven experience where learners can choose and intermix formats. * Personalization is used as a tool to enhance situational interest and motivation by adapting content to specific student attributes. * The framework incorporates active learning through real-time quizzing and feedback to address knowledge gaps as they arise. ### The Personalization Pipeline The technical architecture begins with a layered pipeline that processes source material, such as a textbook PDF, to create a foundational text for all other formats. * The original material is first "re-leveled" to match the learner’s reported grade level while maintaining the integrity and scope of the curriculum. * Generic examples within the text are strategically replaced with personalized examples based on user interests, such as sports, music, or food. * This personalized base text serves as the primary input for generating all subsequent multimodal representations, ensuring consistency across formats. ### Multimodal Content Generation To produce a wide variety of educational assets, the system utilizes a combination of large language models and specialized AI agents. * **Agentic Workflows:** While tools like mind maps and timelines are generated directly by Gemini, complex assets like narrated slides use multi-step agentic workflows to ensure pedagogical effectiveness. * **Custom Visuals:** Because general-purpose image models often struggle with educational accuracy, the researchers fine-tuned a dedicated model specifically for generating educational illustrations. * **Diverse Representations:** The interface provides "immersive text" with embedded questions, audio lessons for auditory learning, and interactive slides that mimic recorded classroom sessions. ### Research Outcomes and Future Application The project’s effectiveness was validated through a study comparing the GenAI approach against standard digital reading materials. * Students using the personalized AI tools showed a significant improvement in retention test scores. * Beyond retention, the system aims to transform passive reading into an active, multimodal experience that follows established learning science principles. * The "Learn Your Way" experiment is currently available on Google Labs, providing a practical look at how adaptive, learner-centric materials might replace static textbooks in future K-12 and higher education settings.

google

Geospatial Reasoning: Unlocking insights with generative AI and multiple foundation models (opens in new tab)

Google Research is introducing Geospatial Reasoning, a new framework that integrates generative AI with specialized foundation models to streamline complex geographical problem-solving. By combining large language models like Gemini with domain-specific data, the initiative seeks to make large-scale spatial analysis accessible to sectors like public health, urban development, and climate resilience. This research effort moves beyond traditional data silos, enabling agentic workflows that can interpret diverse data types—from satellite imagery to population dynamics—through natural language. ### Specialized Foundation Models for Human Activity * The Population Dynamics Foundation Model (PDFM) captures the complex interplay between human behaviors and their local environments. * A dedicated trajectory-based mobility foundation model has been developed to process and analyze movement patterns. * While initially tested in the US, experimental datasets are expanding to include the UK, Australia, Japan, Canada, and Malawi for selected partners. ### Remote Sensing and Vision Architectures * New models utilize advanced architectures including masked autoencoders, SigLIP, MaMMUT, and OWL-ViT, specifically adapted for the remote sensing domain. * Training involves high-resolution satellite and aerial imagery paired with text descriptions and bounding box annotations to enable precise object detection. * The models support zero-shot classification and retrieval, allowing users to locate specific features—such as "residential buildings with solar panels"—using flexible natural language queries. * Internal evaluations show state-of-the-art performance across multiple benchmarks, including image segmentation and post-disaster damage assessment. ### Agentic Workflows and Industry Collaboration * The Geospatial Reasoning framework utilizes LLMs like Gemini to manage complex datasets and orchestrate "agentic" workflows. * These workflows are grounded in geospatial data to ensure that the insights generated are both useful and contextually accurate. * Google is collaborating with inaugural industry partners, including Airbus, Maxar, Planet Labs, and WPP, to test these capabilities in real-world scenarios. Organizations interested in accelerating their geospatial analysis should consider applying for the trusted tester program to explore how these foundation models can be fine-tuned for specific proprietary data and use cases.