diffusion-models

3 posts

google

Deep researcher with test-time diffusion (opens in new tab)

Google Cloud researchers have introduced Test-Time Diffusion Deep Researcher (TTD-DR), a framework that treats long-form research report writing as an iterative diffusion process. By mimicking human research patterns, the system treats initial drafts as "noisy" versions that are gradually polished through retrieval-augmented denoising and self-evolutionary algorithms. This approach achieves state-of-the-art results in generating comprehensive academic-style reports and solving complex multi-hop reasoning tasks. ### The Backbone DR Architecture The system operates through a three-stage pipeline designed to transition from a broad query to a detailed final document: * **Research Plan Generation:** Upon receiving a query, the agent produces a structured outline of key areas to guide the subsequent information-gathering process. * **Iterative Search Agents:** Two sub-agents work in tandem; one formulates specific search questions based on the plan, while the other performs Retrieval-Augmented Generation (RAG) to synthesize precise answers from available sources. * **Final Report Synthesis:** The agent combines the initial research plan with the accumulated question-answer pairs to produce a coherent, evidence-based final report. ### Component-wise Self-Evolution To ensure high-quality inputs at every stage, the framework employs a self-evolutionary algorithm that optimizes the performance of individual agents: * **Diverse Variant Generation:** The system explores multiple diverse answer variants to cover a larger search space and identify the most valuable information. * **Environmental Feedback:** An "LLM-as-a-judge" assesses these variants using auto-raters for metrics like helpfulness and comprehensiveness, providing specific textual feedback for improvement. * **Revision and Cross-over:** Variants undergo iterative revisions based on feedback before being merged into a single, high-quality output that consolidates the best information from all evolutionary paths. ### Report-level Refinement via Diffusion The core innovation of TTD-DR is modeling the writing process as a denoising diffusion mechanism: * **Messy-to-Polished Transformation:** The framework treats the initial rough draft as a noisy input that requires cleaning through factual verification. * **Denoising with Retrieval:** The agent identifies missing information or weak arguments in the draft and uses search tools as a "denoising step" to inject new facts and strengthen the content. * **Continuous Improvement Loop:** This process repeats in cycles, where each iteration uses newly retrieved information to refine the draft into a more accurate and high-quality final version. TTD-DR demonstrates that shifting AI development from linear generation to iterative, diffusion-based refinement significantly improves the depth and rigor of long-form content. This methodology serves as a powerful blueprint for building autonomous agents capable of handling complex, multi-step knowledge tasks.

google

Zooming in: Efficient regional environmental risk assessment with generative AI (opens in new tab)

Google Research has introduced a dynamical-generative downscaling method that combines physics-based climate modeling with probabilistic diffusion models to produce high-resolution regional environmental risk assessments. By bridging the resolution gap between global Earth system models and city-level data needs, this approach provides a computationally efficient way to quantify climate uncertainties at a 10 km scale. This hybrid technique significantly reduces error rates compared to traditional statistical methods while remaining far less computationally expensive than full-scale dynamical simulations. ## The Resolution Gap in Climate Modeling * Traditional Earth system models typically operate at a resolution of ~100 km, which is too coarse for city-level planning regarding floods, heatwaves, and wildfires. * Existing "dynamical downscaling" uses regional climate models (RCMs) to provide physically realistic 10 km projections, but the computational cost is too high to apply to large ensembles of climate data. * Statistical downscaling offers a faster alternative but often fails to capture complex local weather patterns or extreme events, and it struggles to generalize to unprecedented future climate conditions. ## A Hybrid Dynamical-Generative Framework * The process begins with a "physics-based first pass," where an RCM downscales global data to an intermediate resolution of 50 km to establish a common physical representation. * A generative AI system called "R2D2" (Regional Residual Diffusion-based Downscaling) then adds fine-scale details, such as the effects of complex topography, to reach the target 10 km resolution. * R2D2 specifically learns the "residual"—the difference between intermediate and high-resolution fields—which simplifies the learning task and improves the model's ability to generalize to unseen environmental conditions. ## Efficiency and Accuracy in Risk Assessment * The model was trained and validated using the Western United States Dynamically Downscaled Dataset (WUS-D3), which utilizes the "gold standard" WRF model. * The dynamical-generative approach reduced fine-scale errors by over 40% compared to popular statistical methods like BCSD and STAR-ESDM. * A key advantage of this method is its scalability; the AI requires training on only one dynamically downscaled model to effectively process outputs from various other Earth system models, allowing for the rapid assessment of large climate ensembles. By combining the physical grounding of traditional regional models with the speed of diffusion-based AI, researchers can now produce granular risk assessments that were previously cost-prohibitive. This method allows for a more robust exploration of future climate scenarios, providing essential data for farming, water management, and community protection.

line

How should we evaluate AI-generated (opens in new tab)

To optimize the Background Person Removal (BPR) feature in image editing services, the LY Corporation AMD team evaluated various generative AI inpainting models to determine which automated metrics best align with human judgment. While traditional research benchmarks often fail to reflect performance in high-resolution, real-world scenarios, this study identifies a framework for selecting models that produce the most natural results. The research highlights that as the complexity and size of the masked area increase, the gap between model performance becomes more pronounced, requiring more sophisticated evaluation strategies. ### Background Person Removal Workflow * **Instance Segmentation:** The process begins by identifying individual pixels to classify objects such as people, buildings, or trees within the input image. * **Salient Object Detection:** This step distinguishes the main subjects of the photo from background elements to ensure only unwanted figures are targeted for removal. * **Inpainting Execution:** Once the background figures are removed, inpainting technology is used to reconstruct the empty space so it blends seamlessly with the surrounding environment. ### Comparison of Inpainting Technologies * **Diffusion-based Models:** These models, such as FLUX.1-Fill-dev, restore damaged areas by gradually removing noise. While they excel at restoring complex details, they are generally slower than GANs and can occasionally generate artifacts. * **GAN-based Models:** Using a generator-discriminator architecture, models like LaMa and HINT offer faster generation speeds and competitive performance for lower-resolution or smaller inpainting tasks. * **Performance Discrepancy:** Experiments showed that while most models perform well on small areas, high-resolution images with large missing sections reveal significant quality differences that are not always captured in standard academic benchmarks. ### Evaluation Methodology and Metrics * **BPR Evaluation Dataset:** The team curated a specific dataset of 10 images with high quality-variance to test 11 different inpainting models released between 2022 and 2024. * **Single Image Quality Metrics:** Evaluated models using LAION Aesthetics score-v2, CLIP-IQA, and Q-Align to measure the aesthetic quality of individual generated frames. * **Preference and Reward Models:** Utilized PickScore, ImageReward, and HPS v2 to determine which generated images would be most preferred by human users. * **Objective:** The goal of these tests was to find an automated evaluation method that minimizes the need for expensive and time-consuming human reviews while maintaining high reliability. Selecting an inpainting model based solely on paper-presented metrics is insufficient for production-level services. For features like BPR, it is critical to implement an evaluation pipeline that combines both aesthetic scoring and human preference models to ensure consistent quality across diverse, high-resolution user photos.