20 years in the AWS Cloud – how time flies! AWS has reached its 20th anniversary! With a steady pace of innovation, AWS has grown to offer over 240 comprehensive cloud services and continues to launch thousands of new features annually for millions of customers. During this time…
Meta’s Ranking Engineer Agent (REA) autonomously executes key steps across the end-to-end machine learning (ML) lifecycle for ads ranking models. This post covers REA’s ML experimentation capabilities: autonomously generating hypotheses, launching training jobs, debugging failur…
This is the English version of a previously published article. What Is the Software 3.0 Era? In June 2025, Andrej Karpathy gave a talk at Y Combinator AI Startup School. He broke software's evolution into three stages. Software 1.0: What we've done for decades. Writing explicit…
들어가며: NeurIPS 2025가 제시하는 차세대 AI 안전 가이드 생성형 모델은 점점 더 우리 생활에 깊숙히 들어오고 있습니다. LY Corporation에서도 다양한 AI 서비스를 개발해 제공하고 있는데 이런 서비스에 가드레일(guardrails)이 없으면 다양한 공격을 받고 유해한 답변이 노출되거나, 개인 정보나 기밀 유출과 같은 오작동이 발생할 수 있습니다. 즉, 가드레일은 AI를 실서비스에서 운영 가능하게 만드는 필수 인프라입니다. 저희 조직은 사용자가 보다 안전한 환경에서 AI 서비…
Where wild things roam: Identifying wildlife with SpeciesNet March 6, 2026 Tanya Birch, Senior Program Manager, and Dan Morris, Research Scientist, Google Research One year ago SpeciesNet, a tool that uses AI to automatically identify species in camera trap images, went open-sou…
From reactive to proactive: closing the phishing gap with LLMs 2026-03-03 Sebastian Alovisi Ayush Kumar Email security has always been defined by impermanence. It is a perpetual call-and-response arms race, where defenses are only as strong as the last bypass discovered and atta…
Sequential Attention: Making AI models leaner and faster without sacrificing accuracy February 4, 2026 Thomas Fu, Principal Engineer, and Kyriakos Axiotis, Senior Scientist, Google Research We introduce a subset selection algorithm for making large scale ML models more efficien…
Google researchers have validated that smartwatches are a highly reliable and accurate platform for estimating complex spatio-temporal gait metrics, rivaling the performance of smartphone-based methods. By utilizing a multi-head deep learning model, the study demonstrates that wrist-worn devices can provide continuous, lab-grade health insights into a user's walking speed, step length, and balance without requiring the specific pocket placement or specialized laboratory equipment previously necessary for such data.
## Multi-Head Deep Learning for Wrist-Based Sensors
* The researchers developed a temporal convolutional network (TCN) architecture designed to process raw inertial measurement unit (IMU) data, specifically 3-axis accelerometer and gyroscope signals sampled at 50 Hz.
* Unlike traditional models that only track temporal events and are prone to integration drift, this multi-head approach directly estimates both unilateral and bilateral metrics simultaneously.
* The model architecture extracts embeddings from the IMU signals and concatenates them with user height (a demographic scalar input) to improve the precision of spatial predictions.
* The system estimates a comprehensive suite of metrics, including gait speed, double support time (the proportion of time both feet are on the ground), step length, swing time, and stance time.
## Large-Scale Validation and Study Protocol
* To ensure rigorous results, the study involved a diverse cohort of 246 participants across two international sites, generating approximately 70,000 walking segments.
* Ground truth measurements were captured using a professional-grade Zeno Gait Walkway system to provide high-precision reference data for comparison.
* The study protocol included various walking conditions to test the model's versatility: a self-paced six-minute walk test (6MWT), fast-paced walking, and induced physical asymmetry created by wearing hinged knee braces at specific angles.
* Researchers employed a five-fold cross-validation strategy, ensuring that all data from a single participant remained within a single split to prevent data leakage and ensure the model generalizes to new users.
## Clinical Validity and Comparative Performance
* Smartwatch estimates demonstrated strong validity and excellent reliability, with Pearson correlation coefficients (r) and intraclass correlation coefficients (ICC) exceeding 0.80 for most metrics.
* Performance comparisons showed non-significant differences in Mean Absolute Percentage Error (MAPE) between the Pixel Watch and Pixel phone, establishing the smartwatch as a viable alternative to smartphone-based tracking.
* While double support time showed slightly lower but acceptable reliability (ICC 0.56–0.60), other metrics like step length and gait speed proved highly consistent across different walking speeds and styles.
* The model’s success suggests that smartwatches can effectively bridge the gap in gait analysis, providing a more practical and consistent platform for continuous health tracking than handheld devices.
This research establishes smartwatches as a powerful tool for longitudinal health monitoring, enabling the detection of neurological or musculoskeletal changes through passive, continuous gait analysis in everyday environments.
Baedal Minjok (Baemin) has significantly improved its cart recommendation system by transitioning from a basic Item2Vec model to a sophisticated two-stage architecture that combines graph-based embeddings with Transformer sequence modeling. This evolution addresses the "substitutability bias" and lack of sequential context found in previous methods, allowing the system to understand the specific intent behind a user's shopping journey. By moving beyond simple item similarity, the new model effectively identifies cross-selling opportunities that align with the logical flow of a customer's purchase behavior.
### Limitations of the Item2Vec Approach
* **Substitutability Bias:** The original Item2Vec model, based on the Skip-gram architecture, tended to map items from the same category into similar vector spaces. This resulted in recommending alternative brands of the same product (e.g., suggesting another brand of milk) rather than complementary goods (e.g., cereal or bread).
* **Loss of Sequential Context:** Because Item2Vec treats a basket of goods as a "bag of words," it ignores the order in which items are added. This prevents the model from distinguishing between different user intents, such as a user starting with meat to grill versus a user starting with ingredients for a stew.
* **Failure in Cross-Selling:** The primary goal of cart recommendations is to encourage cross-selling, but the reliance on embedding similarity alone limited the diversity of suggestions, often trapping users within a single product category.
### Stage 1: Graph-Based Product and Category Embeddings
* **Node2Vec Implementation:** To combat data sparsity and the "long-tail" problem where many items have low purchase frequency, the team utilized Node2Vec. This method uses random walks to generate sequences that help the model learn structural relationships even when direct transaction data is thin.
* **Heterogeneous Graph Construction:** The graph consists of both "Item Nodes" and "Category Nodes." Connecting items to their respective categories allows the system to generate initial vectors for new or low-volume products that lack sufficient historical purchase data.
* **Association Rule Weighting:** Rather than using simple co-occurrence counts for edge weights, the team applied Association Rules. This ensures that weights reflect the actual strength of the complementary relationship, preventing popular "mega-hit" items from dominating all recommendation results.
### Stage 2: Transformer-Based Sequence Recommendation
* **Capturing Purchase Context:** The second stage employs a Transformer model to analyze the sequence of items currently in the user's cart. This architecture is specifically designed to understand how the meaning of an item changes based on what preceded it.
* **Next Item Prediction:** Using the pre-trained embeddings from Stage 1 as inputs, the Transformer predicts the most likely "next item" a user will add. This allows the system to provide dynamic recommendations that evolve as the user continues to shop.
* **Integration of Category Data:** By feeding both item-level and category-level embeddings into the Transformer, the model maintains a high level of accuracy even when a user interacts with niche products, as the category context provides a fallback for the recommendation logic.
### Practical Conclusion
For production-scale recommendation systems, relying solely on item similarity often leads to redundant suggestions that do not drive incremental sales. By decoupling the learning of structural relationships (via graphs) from the learning of temporal intent (via Transformers), engineers can build a system that is robust against data sparsity while remaining highly sensitive to the immediate context of a user's session. This two-stage approach is recommended for e-commerce environments where cross-category discovery is a key business metric.
Google Research has introduced Nested Learning, a paradigm that treats machine learning models as systems of interconnected, multi-level optimization problems rather than separate architectures and training rules. By unifying structure and optimization through varying update frequencies, this approach aims to mitigate "catastrophic forgetting," the tendency for models to lose old knowledge when acquiring new skills. The researchers validated this framework through "Hope," a self-modifying architecture that outperforms current state-of-the-art models in long-context memory and language modeling.
### The Nested Learning Paradigm
This framework shifts the view of machine learning from a single continuous process to a set of coherent, nested optimization problems. Each component within a model is characterized by its own "context flow"—the specific set of information it learns from—and its own update frequency.
* The paradigm argues that architecture (structure) and optimization (training rules) are fundamentally the same concept, differing only by their level of computational depth and update rates.
* Associative memory is used as a core illustrative concept, where the training process (backpropagation) is modeled as a system mapping data points to local error values.
* By defining an update frequency rate for each component, researchers can order these problems into "levels," allowing for a more unified and efficient learning system inspired by the human brain's neuroplasticity.
### Deep Optimizers and Refined Objectives
Nested Learning provides a principled way to improve standard optimization algorithms by viewing them through the lens of associative memory modules.
* Existing momentum-based optimizers often rely on simple dot-product similarity, which fails to account for how different data samples relate to one another.
* By replacing these simple similarities with standard loss metrics, such as L2 regression loss, the researchers derived new formulations for momentum that are more resilient to imperfect or noisy data.
* This approach turns the optimizer itself into a deeper learning component with its own internal optimization objective.
### Continuum Memory Systems and the "Hope" Architecture
The paradigm addresses the limitations of Large Language Models (LLMs), which are often restricted to either their immediate input window or static pre-trained knowledge.
* The researchers developed "Hope," a proof-of-concept architecture that utilizes multi-time-scale updates for its internal components.
* While standard Transformers act primarily as short-term memory, the Nested Learning approach allows for "continuum memory" that manages long-context information more effectively.
* Experimental results show that this self-modifying architecture achieves superior performance in language modeling compared to existing state-of-the-art models.
By recognizing that every part of a model is essentially an optimizer operating at a different frequency, Nested Learning offers a path toward AI that can adapt to new experiences in real-time. This structural shift moves away from the "static pre-training" bottleneck and toward systems capable of true human-like neuroplasticity and lifelong learning.
Research from Google DeepMind and Google Research introduces ForestCast, a deep learning-based framework designed to transition forest management from retrospective loss monitoring to proactive risk forecasting. By utilizing vision transformers and pure satellite data, the team has developed a scalable method to predict future deforestation that matches or exceeds the accuracy of traditional models dependent on inconsistent manual inputs. This approach provides a repeatable, future-proof benchmark for protecting biodiversity and mitigating climate change on a global scale.
### Limitations of Traditional Forecasting
* Existing state-of-the-art models rely on specialized geospatial maps, such as infrastructure development, road networks, and regional economic indicators.
* These traditional inputs are often "patchy" and inconsistent across different countries, requiring manual assembly that is difficult to replicate globally.
* Manual data sources are not future-proof; they tend to go out of date quickly with no guarantee of regular updates, unlike continuous satellite streams.
### A Scalable Pure-Satellite Architecture
* The ForestCast model adopts a "pure satellite" approach, using only raw inputs from Landsat and Sentinel-2 satellites.
* The architecture is built on vision transformers (ViTs) that process an entire tile of pixels in a single pass to capture critical spatial context and landscape-level trends.
* The model incorporates a satellite-derived "change history" layer, which identifies previously deforested pixels and the specific year the loss occurred.
* By avoiding socio-political or infrastructure maps, the method can be applied consistently to any region on Earth, allowing for meaningful cross-regional comparisons.
### Key Findings and Benchmark Release
* Research indicates that "change history" is the most information-dense input; a model trained on this data alone performs almost as well as those using raw multi-spectral data.
* The model successfully predicts tile-to-tile variation in deforestation amounts and identifies the specific pixels most likely to be cleared next.
* Google has released the training and evaluation data as a public benchmark dataset, focusing initially on Southeast Asia to allow the machine learning community to verify and improve upon the results.
The release of ForestCast provides a template for scaling predictive modeling to Latin America, Africa, and boreal latitudes. Conservationists and policymakers should utilize these forecasting tools to move beyond counting historical losses and instead direct resources toward "frontline" areas where the model identifies imminent risk of habitat conversion.
DeepSomatic is an AI-powered tool developed by Google Research to identify cancer-related mutations by analyzing a tumor's genetic sequence with higher accuracy than current methods. By leveraging convolutional neural networks (CNNs), the model distinguishes between inherited genetic traits and acquired somatic variants that drive cancer progression. This flexible tool supports multiple sequencing platforms and sample types, offering a critical resource for clinicians and researchers aiming to personalize cancer treatment through precision medicine.
## Challenges in Somatic Variant Detection
* Somatic variants are genetic mutations acquired after birth through environmental exposure or DNA replication errors, making them distinct from the germline variants found in every cell of a person's body.
* Detecting these mutations is technically difficult because tumor samples are often heterogeneous, containing a diverse set of variants at varying frequencies.
* Sequencing technologies often introduce small errors that can be difficult to distinguish from actual somatic mutations, especially when the mutation is only present in a small fraction of the sampled cells.
## CNN-Based Variant Calling Architecture
* DeepSomatic employs a method pioneered by DeepVariant, which involves transforming raw genetic sequencing data into a set of multi-channel images.
* These images represent various data points, including alignment along the chromosome, the quality of the sequence output, and other technical variables.
* The convolutional neural network processes these images to differentiate between three categories: the human reference genome, non-cancerous germline variants, and the somatic mutations driving tumor growth.
* By analyzing tumor and non-cancerous cells side-by-side, the model effectively filters out sequencing artifacts that might otherwise be misidentified as mutations.
## System Versatility and Application
* The model is designed to function in multiple modes, including "tumor-normal" (comparing a biopsy to a healthy sample) and "tumor-only" mode, which is vital for blood cancers like leukemia where isolating healthy cells is difficult.
* DeepSomatic is platform-agnostic, meaning it can process data from all major sequencing technologies and adapt to different types of sample processing.
* The tool has demonstrated the ability to generalize its learning to various cancer types, even those not specifically included in its initial training sets.
## Open-Source Contributions to Precision Medicine
* Google has made the DeepSomatic tool and the CASTLE dataset—a high-quality training and evaluation set—openly available to the global research community.
* This initiative is part of a broader effort to use AI for early detection and advanced research in various cancers, including breast, lung, and gynecological cancers.
* The release aims to accelerate the development of personalized treatment plans by providing a more reliable way to identify the specific genetic drivers of an individual's disease.
By providing a more accurate and adaptable method for variant calling, DeepSomatic helps researchers pinpoint the specific drivers of a patient's cancer. This tool represents a significant advancement in deep learning for genomics, potentially shortening the path from biopsy to targeted therapeutic intervention.
Discord’s machine learning infrastructure reached a critical scaling limit as models and datasets grew beyond the capacity of single-machine systems. To overcome these bottlenecks, the engineering team transitioned to a distributed compute architecture built on the Ray framework and a suite of custom orchestration tools. This evolution moved Discord from ad-hoc experimentation to a robust production platform, resulting in significant performance gains such as a 200% improvement in business metrics for Ads Ranking.
### Overcoming Hardware and Data Bottlenecks
* Initial ML systems relied on simple classifiers that eventually evolved into complex models serving hundreds of millions of users.
* Training requirements shifted from single-machine tasks to workloads requiring multiple GPUs.
* Datasets expanded to the point where they could no longer fit on individual machines, creating a need for distributed storage and processing.
* Infrastructure growth struggled to keep pace with the exponential increase in computational demands.
### Building a Ray-Based ML Platform
* The Ray framework was adopted as the foundation for distributed computing to simplify complex ML workflows.
* Discord integrated Dagster with KubeRay to manage the orchestration of production-grade machine learning pipelines.
* Custom CLI tooling was developed to lower the barrier to entry for engineers, focusing heavily on developer experience.
* A specialized observability layer called X-Ray was implemented to provide deep insights into distributed system performance.
By prioritizing developer experience and creating accessible abstractions over raw compute power, Discord successfully industrialized its ML operations. For organizations facing similar scaling hurdles, the focus should be on building a unified platform that turns the complexity of distributed systems into a seamless tool for modelers.
Google Research and Move37 Labs have introduced NucleoBench, a comprehensive open-source benchmark for nucleic acid design, alongside AdaBeam, a high-performing new optimization algorithm. While AI models have become highly proficient at predicting the biological properties of DNA and RNA, generating optimal sequences within massive search spaces—such as the $2 \times 10^{120}$ possible variations for a 5' UTR—remains a significant hurdle. By standardizing evaluation across 16 distinct biological tasks, this research identifies AdaBeam as a superior method that scales effectively to the large-scale models required for modern drug discovery.
## Standardizing the Optimization Pipeline
The process of computational nucleic acid design typically follows a five-step workflow: data collection, training a predictive model, generating candidate sequences (the design step), wet-lab validation, and iterative retraining. NucleoBench focuses specifically on the design step, which has historically lacked standardized evaluation.
* Most existing benchmarks rely on decades-old methods like simulated annealing or vanilla genetic algorithms.
* Traditional algorithms often treat predictive models as "black boxes," failing to leverage internal model data to guide the search.
* The vastness of genomic search spaces makes brute-force optimization impossible, necessitating more intelligent, model-aware generation strategies.
## The NucleoBench Framework
NucleoBench is the first large-scale benchmark designed to compare gradient-free and gradient-based design algorithms under identical conditions. The framework encompasses over 400,000 experiments to ensure statistical rigor across diverse biological challenges.
* **Algorithm Categories**: It compares gradient-free methods (like directed evolution), which are simple but ignore model internals, against gradient-based methods (like FastSeqProp), which use the model’s internal "direction of steepest improvement" to find better sequences.
* **Task Diversity**: The 16 tasks include controlling gene expression in specific cell types (liver or neuronal), maximizing transcription factor binding, and improving chromatin accessibility.
* **Scale**: The benchmark includes long-range DNA sequence challenges using large-scale models like Enformer, which are computationally demanding but critical for understanding complex genomic interactions.
## AdaBeam’s Hybrid Optimization Performance
Drawing on insights from the NucleoBench evaluation, the researchers developed AdaBeam, a hybrid algorithm that combines the strengths of various optimization strategies.
* **Success Rate**: AdaBeam outperformed existing algorithms on 11 of the 16 tasks in the benchmark.
* **Efficiency and Scaling**: Unlike many gradient-based methods that struggle with computational overhead, AdaBeam demonstrates superior scaling properties as sequences become longer and predictive models grow in complexity.
* **Methodology**: It functions as a hybrid approach, using sophisticated search techniques to navigate the sequence space more effectively than "vanilla" algorithms developed before the era of deep learning.
The researchers have made AdaBeam and the NucleoBench repository freely available to the scientific community. By providing a standardized environment for testing, they aim to accelerate the development of next-generation treatments, including more stable mRNA vaccines and precise CRISPR gene therapies.
DeepPolisher is a deep learning-based genome assembly tool designed to correct base-level errors with high precision, significantly enhancing the accuracy of genomic research. By leveraging a Transformer architecture to analyze sequencing data, the tool reduces total assembly errors by 50% and insertion or deletion (indel) errors by 70%. This advancement is critical for creating near-perfect reference genomes, such as the Human Pangenome Reference, which are essential for identifying disease-causing variants and understanding human evolution.
## Limitations of Current Sequencing Technologies
* Genome assembly relies on reading nucleotides (A, T, G, and C), but the microscopic scale of these base pairs makes accurate, large-scale sequencing difficult.
* Short-read sequencing methods provide high signal strength but are limited to a few hundred nucleotides because identical DNA clusters eventually desynchronize, blending signals together.
* Long-read technologies can sequence tens of thousands of nucleotides but initially suffered from high error rates (~10%); while tools like DeepConsensus have reduced this to 0.1%, further refinement is necessary for high-fidelity reference genomes.
* Even a 0.1% error rate results in millions of inaccuracies across the 3-billion-nucleotide human genome, which can cause researchers to miss critical genetic markers or misidentify proteins.
## DeepPolisher Architecture and Training
* DeepPolisher is an open-source pipeline adapted from the DeepConsensus model, utilizing a Transformer-based neural network.
* The model was trained using a human cell line from the Personal Genomes Project that is estimated to be 99.99999% accurate, providing a "ground truth" for identifying and correcting errors.
* The system takes sequenced bases, their associated quality scores, and the orientation of the DNA strands to learn complex error patterns that traditional methods might miss.
* By combining sequence reads from multiple DNA molecules of the same individual, the tool iteratively "polishes" the assembly to reach the accuracy required for reference-grade data.
## Impact on Genomic Accuracy and Gene Discovery
* The tool’s ability to reduce indel errors by 70% is particularly significant, as these specific errors often interfere with the identification of protein-coding genes.
* DeepPolisher has already been integrated into major research efforts, including the enhancement of the Human Pangenome Reference, providing a more robust foundation for clinical diagnostics.
* Improved assembly accuracy allows for better mapping of regions where the genome is highly repetitive, which were previously difficult to sequence and assemble confidently.
For researchers and bioinformaticians, DeepPolisher represents a vital step in moving from "draft" genomes to high-fidelity references. Adopting this tool in assembly pipelines can drastically improve the reliability of variant calling and gene annotation, especially in complex clinical and evolutionary studies.