machine-learning

50 posts

meta

Adapting the Facebook Reels RecSys AI Model Based on User Feedback - Engineering at Meta (opens in new tab)

Meta has enhanced the Facebook Reels recommendation engine by shifting focus from traditional engagement signals, like watch time and likes, to direct user feedback. By implementing the User True Interest Survey (UTIS) model, the system now prioritizes content that aligns with genuine user preferences rather than just short-term interactions. This shift has resulted in significant improvements in recommendation relevance, high-quality content delivery, and long-term user retention. **Limitations of Engagement-Based Metrics** * Traditional signals like "likes" and "watch time" are often noisy and may not reflect a user’s actual long-term interests. * Models optimized solely for engagement tend to favor short-term value over the long-term utility of the product. * Internal research found that previous heuristic-based interest models only achieved 48.3% precision in identifying what users truly care about. * Effective interest matching requires understanding nuanced factors such as production style, mood, audio, and motivation, which implicit signals often miss. **The User True Interest Survey (UTIS) Model** * Meta collects direct feedback via randomized, single-question surveys asking users to rate video interest on a 1–5 scale. * The raw survey data is binarized to denoise responses and weighted to correct for sampling and nonresponse bias. * The UTIS model functions as a lightweight "alignment model layer" built on top of the main multi-task ranking system. * The architecture uses existing model predictions as input features, supplemented by engineered features that capture content attributes and user behavior. **Integration into the Ranking Funnel** * **Late Stage Ranking (LSR):** The UTIS score is used as an additional input feature in the final value formula, allowing the system to boost high-interest videos and demote low-interest ones. * **Early Stage Ranking (Retrieval):** The model aggregates survey data to reconstruct user interest profiles, helping the system source more relevant candidates during the initial retrieval phase. * **Knowledge Distillation:** Large sequence-based retrieval models are aligned using UTIS predictions as labels through distillation objectives. **Performance and Impact** * The deployment of UTIS has led to a measurable increase in the delivery of niche, high-quality content. * Generic, popularity-based recommendations that often lack depth have been reduced. * Meta observed robust improvements across core metrics, including higher follow rates, more shares, and increased user retention. * The system now offers better interpretability, allowing engineers to understand which specific factors contribute to a user’s sense of "interest match." To continue improving the Reels ecosystem, Meta is focusing on doubling down on personalization by tackling challenges related to sparse data and sampling bias while exploring more advanced AI architectures to further diversify recommendations.

google

NeuralGCM harnesses AI to better simulate long-range global precipitation (opens in new tab)

NeuralGCM represents a significant evolution in atmospheric modeling by combining traditional fluid dynamics with neural networks to solve the long-standing challenge of simulating global precipitation. By training the AI component directly on high-quality NASA satellite observations rather than biased reanalysis data, the model achieves unprecedented accuracy in predicting daily weather cycles and extreme rainfall events. This hybrid approach offers a faster, more precise tool for both medium-range weather forecasting and multi-decadal climate projections. ## The Limitations of Cloud Parameterization * Precipitation is driven by cloud processes occurring at scales as small as 100 meters, which is far below the kilometer-scale resolution of global weather models. * Traditional models rely on "parameterizations," or mathematical approximations, to estimate how these small-scale events affect the larger atmosphere. * Because these approximations are often simplified, traditional models struggle to accurately capture the complexity of water droplet formation and ice crystal growth, leading to errors in long-term forecasts. ## Training on Direct Satellite Observations * Unlike previous AI models trained on "reanalyses"—which are essentially simulations used to fill observational gaps—NeuralGCM is trained on NASA satellite-based precipitation data spanning 2001 to 2018. * The model utilizes a differentiable dynamical core, an architecture that allows the neural network to learn the effects of small-scale events directly from physical observations. * By bypassing the weaknesses inherent in reanalysis data, the model effectively creates a machine-learned parameterization that is more faithful to real-world cloud physics. ## Performance in Weather and Climate Benchmarks * At a resolution of 280 km, NeuralGCM outperforms leading operational models in medium-range forecasts (up to 15 days) and matches the precision of sophisticated multi-decadal climate models. * The model shows a marked improvement in capturing precipitation extremes, particularly for the top 0.1% of rainfall events. * Evaluation through WeatherBench 2 demonstrates that NeuralGCM accurately reproduces the diurnal (daily) weather cycle, a metric where traditional physics-based models frequently fall short. NeuralGCM provides a highly efficient and accessible framework for researchers and city planners who need to simulate long-range climate scenarios, such as 100-year storms or seasonal agricultural cycles. Its ability to maintain physical consistency while leveraging the speed of AI makes it a powerful candidate for the next generation of global atmospheric modeling.

toss

Toss's AI Technology Recognized (opens in new tab)

Toss ML Engineer Jin-woo Lee presents FedLPA, a novel Federated Learning algorithm accepted at NeurIPS 2025 that addresses the critical challenges of data sovereignty and non-uniform data distributions. By allowing AI models to learn from localized data without transferring sensitive information across borders, this research provides a technical foundation for expanding services like Toss Face Pay into international markets with strict privacy regulations. ### The Challenge of Data Sovereignty in Global AI * Traditional AI development requires centralizing data on a single server, which is often impossible due to international privacy laws and data sovereignty regulations. * Federated Learning offers a solution by sending the model to the user’s device (client) rather than moving the data, ensuring raw biometric information never leaves the local environment. * Standard Federated Learning fails in real-world scenarios where data is non-IID (Independent and Identically Distributed), meaning user patterns in different countries or regions vary significantly. ### Overcoming Limitations in Category Discovery * Existing models assume all users share similar data distributions and that all data classes are known beforehand, which leads to performance degradation when encountering new demographics. * FedLPA incorporates Generalized Category Discovery (GCD) to identify both known classes and entirely "novel classes" (e.g., new fraud patterns or ethnic features) that were not present in the initial training set. * This approach prevents the model from becoming obsolete as it encounters new environments, allowing it to adapt to local characteristics autonomously. ### The FedLPA Three-Step Learning Pipeline * **Confidence-guided Local Structure Discovery (CLSD):** The system builds a similarity graph by comparing feature vectors of local data. It refines these connections using "high-confidence" samples—data points the model is certain about—to strengthen the quality of the relational map. * **InfoMap Clustering:** Instead of requiring a human to pre-define the number of categories, the algorithm uses the InfoMap community detection method. This allows the client to automatically estimate the number of unique categories within its own local data through random walks on the similarity graph. * **Local Prior Alignment (LPA):** The model uses self-distillation to ensure consistent predictions across different views of the same data. Most importantly, an LPA regularizer forces the model’s prediction distribution to align with the "Empirical Prior" discovered in the clustering phase, preventing the model from becoming biased toward over-represented classes. ### Business Implications and Strategic Value * **Regulatory Compliance:** FedLPA removes technical barriers to entry for markets like the EU or Southeast Asia by maintaining high model performance while strictly adhering to local data residency requirements. * **Hyper-personalization:** Financial services such as Fraud Detection Systems (FDS) and Credit Scoring Systems (CSS) can be trained on local patterns, allowing for more accurate detection of region-specific scams or credit behaviors. * **Operational Efficiency:** By enabling models to self-detect and learn from new patterns without manual labeling or central intervention, the system significantly reduces the cost and time required for global maintenance. Implementing localized Federated Learning architectures like FedLPA is a recommended strategy for tech organizations seeking to scale AI services internationally while navigating the complex landscape of global privacy regulations and diverse data distributions.

meta

DrP: Meta's Root Cause Analysis Platform at Scale - Engineering at Meta (opens in new tab)

DrP is Meta’s programmatic root cause analysis (RCA) platform designed to automate incident investigations and reduce the burden of manual on-call tasks. By codifying investigation playbooks into executable "analyzers," the platform significantly lowers the mean time to resolve (MTTR) by 20% to 80% for over 300 teams. This systematic approach replaces outdated manual scripts with a scalable backend that executes 50,000 automated analyses daily, providing immediate context when alerts fire. ## Architecture and Core Components * **Expressive SDK:** Provides a framework for engineers to codify investigation workflows into "analyzers," utilizing a rich library of helper functions and machine learning algorithms. * **Built-in Analysis Tools:** The platform includes native support for anomaly detection, event isolation, time-series correlation, and dimension analysis to identify specific problem areas. * **Scalable Backend:** A multi-tenant execution environment manages a worker pool that handles thousands of requests securely and asynchronously. * **Workflow Integration:** DrP is integrated directly into Meta’s internal alerting and incident management systems, allowing for automatic triggering without human intervention. ## Authoring and Verification Workflow * **Template Bootstrapping:** Engineers use the SDK to generate boilerplate code that captures required input parameters and context in a type-safe manner. * **Analyzer Chaining:** The system allows for seamless dependency analysis by passing context between different analyzers, enabling investigations to span multiple interconnected services. * **Automated Backtesting:** Before deployment, analyzers undergo automated backtesting integrated into the code review process to ensure accuracy and performance. * **Decision Tree Logic:** Investigation steps are modeled as decision trees within the code, allowing the analyzer to follow different paths based on the data it retrieves. ## Execution and Post-Processing * **Trigger-based Analysis:** When an alert is activated, the backend automatically queues the relevant analyzer, ensuring findings are available as soon as an engineer begins triaging. * **Automated Mitigation:** A post-processing system can take direct action based on investigation results, such as creating tasks or submitting pull requests to resolve identified issues. * **DrP Insights:** This system periodically reviews historical analysis outputs to identify and rank the top causes of alerts, helping teams prioritize long-term reliability fixes. * **Alert Annotation:** Results are presented in both human-readable text and machine-readable formats, directly annotating the incident logs for the on-call responder. ## Practical Conclusion Organizations managing large-scale distributed systems should transition from static markdown playbooks to executable investigation code. By implementing a programmatic RCA framework like DrP, teams can scale their troubleshooting expertise and significantly reduce "on-call fatigue" by automating the repetitive triage steps that typically consume the first hour of an incident.

woowahan

Delivering the Future: Global (opens in new tab)

The Global Hackathon 2025 served as a massive collaborative initiative to unite over 270 technical employees from seven global entities under DeliveryHero’s umbrella, including Woowa Brothers. By leveraging the community-building expertise of the Woowahan DevRel team, the event successfully bridged geographical and technical gaps to foster innovation in "Delivering the Future." The hackathon concluded with high-level recognition from global leadership and a strategic partnership with Google Cloud, demonstrating the power of synchronized global technical synergy. ## Strategic Planning and Global Coordination * The event adopted a hybrid "Base Camp" model, where participants worked from their local entity offices while staying connected through 24-hour live streaming and centralized online channels. * Organizers meticulously navigated the logistical hurdles of spanning 70 countries, including coordinating across vastly different time zones and respecting local public holidays and vacation seasons. * Efficiency was maintained through a decentralized communication strategy, using entity-specific meetings and comprehensive guidebooks rather than frequent global meetings to prevent "meeting fatigue" across time zones. ## Technical Infrastructure and Regulatory Compliance * To accommodate diverse technical preferences, the infrastructure had to support various stacks, including AWS, Google Cloud Platform (GCP), and specific machine learning models. * The central organization team addressed complex regulatory challenges, ensuring all sandbox environments complied with strict global security standards and GDPR (EU General Data Protection Regulation). * A strategic partnership with Google Cloud provided a standardized Google AI-based environment, enabling teams to experiment rapidly with mature tools and cloud-native services. ## Local Operations and Cross-Entity Collaboration * Physical office spaces were transformed into immersive hackathon hubs to maintain the high-intensity atmosphere characteristic of offline coding marathons. * The event encouraged "office sharing" between entities located in the same city and even supported travel for members to join different regional base camps, fostering a truly global networking culture. * Local supporters used standardized checklists and operational frameworks to ensure a consistent experience for participants, whether they were in Seoul, Berlin, or Dubai. Building a successful global technical event requires a delicate balance between centralized infrastructure and local autonomy. For organizations operating across multiple regions, investing in shared technical sandboxes and robust communication frameworks is essential for turning fragmented local talent into a unified global innovation engine.

google

Spotlight on innovation: Google-sponsored Data Science for Health Ideathon across Africa (opens in new tab)

Google Research, in partnership with several pan-African machine learning communities, recently concluded the Africa-wide Data Science for Health Ideathon to address regional medical challenges. By providing access to specialized open-source health models and technical mentorship, the initiative empowered local researchers to develop tailored solutions for issues ranging from maternal health to oncology. The event demonstrated that localized innovation, supported by high-performance AI foundations, can effectively bridge healthcare gaps in resource-constrained environments. ## Collaborative Framework and Objectives * The Ideathon was launched at the 2025 Deep Learning Indaba in Kigali, Rwanda, in collaboration with SisonkeBiotik, Ro’ya, and DS-I Africa. * The primary goal was to foster capacity building within the African AI community, moving beyond theoretical research toward the execution of practical healthcare tools. * Participants received hands-on training on Google’s specialized health models and were supported with Google Cloud Vertex AI compute credits and mentorship from global experts. * Submissions were evaluated based on their innovation, technical feasibility, and contextual relevance to African health systems. ## Technical Foundations and Google Health Models * Developers focused on a suite of open health AI models, including MedGemma for clinical reasoning, TxGemma for therapeutics, and MedSigLIP for medical vision-language tasks. * The competition utilized a two-phase journey: an initial "Idea Development" stage where teams defined clinical problems and outlined AI approaches, followed by a "Prototype & Pitch" phase. * Technical implementations frequently involved advanced techniques such as Retrieval-Augmented Generation (RAG) to ensure alignment with local medical protocols and WHO guidelines. * Fine-tuning methods, specifically Low-Rank Adaptation (LoRA), were utilized by teams to specialize large-scale models like MedGemma-27B-IT for niche datasets. ## Innovative Solutions for Regional Health * **Dawa Health:** This first-place winner developed an AI-powered cervical cancer screening tool that uses MedSigLIP to identify abnormalities in colposcopy images uploaded via WhatsApp, combined with Gemini RAG for clinical guidance. * **Solver (CerviScreen AI):** This team built a web application for automated cervical-cytology screening by fine-tuning MedGemma-27B-IT on the CRIC dataset to assist cytopathologists with annotated images. * **Mkunga:** A maternal health call center that adapts MedGemma and Gemini to provide advice in Swahili using Speech-to-Text (STT) and Text-to-Speech (TTS) technologies. * **HexAI (DermaDetect):** Recognized for the best proof-of-concept, this offline-first mobile app allows community health workers to triage skin conditions using on-device versions of MedSigLIP, specifically designed for low-connectivity areas. The success of the Ideathon underscores the importance of "local solutions for local priorities." By making sophisticated models like MedGemma and MedSigLIP openly available, the technical barrier to entry is lowered, allowing African developers to build high-impact, culturally and linguistically relevant medical tools. For organizations looking to implement AI in global health, this model of providing foundational tools and cloud resources to local experts remains a highly effective strategy for sustainable innovation.

woowahan

Enhancing the “Frequently (opens in new tab)

Baedal Minjok (Baemin) has significantly improved its cart recommendation system by transitioning from a basic Item2Vec model to a sophisticated two-stage architecture that combines graph-based embeddings with Transformer sequence modeling. This evolution addresses the "substitutability bias" and lack of sequential context found in previous methods, allowing the system to understand the specific intent behind a user's shopping journey. By moving beyond simple item similarity, the new model effectively identifies cross-selling opportunities that align with the logical flow of a customer's purchase behavior. ### Limitations of the Item2Vec Approach * **Substitutability Bias:** The original Item2Vec model, based on the Skip-gram architecture, tended to map items from the same category into similar vector spaces. This resulted in recommending alternative brands of the same product (e.g., suggesting another brand of milk) rather than complementary goods (e.g., cereal or bread). * **Loss of Sequential Context:** Because Item2Vec treats a basket of goods as a "bag of words," it ignores the order in which items are added. This prevents the model from distinguishing between different user intents, such as a user starting with meat to grill versus a user starting with ingredients for a stew. * **Failure in Cross-Selling:** The primary goal of cart recommendations is to encourage cross-selling, but the reliance on embedding similarity alone limited the diversity of suggestions, often trapping users within a single product category. ### Stage 1: Graph-Based Product and Category Embeddings * **Node2Vec Implementation:** To combat data sparsity and the "long-tail" problem where many items have low purchase frequency, the team utilized Node2Vec. This method uses random walks to generate sequences that help the model learn structural relationships even when direct transaction data is thin. * **Heterogeneous Graph Construction:** The graph consists of both "Item Nodes" and "Category Nodes." Connecting items to their respective categories allows the system to generate initial vectors for new or low-volume products that lack sufficient historical purchase data. * **Association Rule Weighting:** Rather than using simple co-occurrence counts for edge weights, the team applied Association Rules. This ensures that weights reflect the actual strength of the complementary relationship, preventing popular "mega-hit" items from dominating all recommendation results. ### Stage 2: Transformer-Based Sequence Recommendation * **Capturing Purchase Context:** The second stage employs a Transformer model to analyze the sequence of items currently in the user's cart. This architecture is specifically designed to understand how the meaning of an item changes based on what preceded it. * **Next Item Prediction:** Using the pre-trained embeddings from Stage 1 as inputs, the Transformer predicts the most likely "next item" a user will add. This allows the system to provide dynamic recommendations that evolve as the user continues to shop. * **Integration of Category Data:** By feeding both item-level and category-level embeddings into the Transformer, the model maintains a high level of accuracy even when a user interacts with niche products, as the category context provides a fallback for the recommendation logic. ### Practical Conclusion For production-scale recommendation systems, relying solely on item similarity often leads to redundant suggestions that do not drive incremental sales. By decoupling the learning of structural relationships (via graphs) from the learning of temporal intent (via Transformers), engineers can build a system that is robust against data sparsity while remaining highly sensitive to the immediate context of a user's session. This two-stage approach is recommended for e-commerce environments where cross-category discovery is a key business metric.

aws

AWS Weekly Roundup: AWS re:Invent keynote recap, on-demand videos, and more (December 8, 2025) | AWS News Blog (opens in new tab)

The December 8, 2025, AWS Weekly Roundup recaps the major themes from AWS re:Invent, signaling a significant industry transition from AI assistants to autonomous AI agents. While technical innovation in infrastructure remains a priority, the event underscored that developers remain at the heart of the AWS mission, empowered by new tools to automate complex tasks using natural language. This shift represents a "renaissance" in cloud computing, where purpose-built infrastructure is now designed to support the non-deterministic nature of agentic workloads. ## Community Recognition and the Now Go Build Award * Raphael Francis Quisumbing (Rafi) from the Philippines was honored with the Now Go Build Award, presented by Werner Vogels. * A veteran of the ecosystem, Quisumbing has served as an AWS Hero since 2015 and has co-led the AWS User Group Philippines for over a decade. * The recognition emphasizes AWS's continued focus on community dedication and the role of individual builders in empowering regional developer ecosystems. ## The Evolution from AI Assistants to Agents * AWS CEO Matt Garman identified AI agents as the next major inflection point for the industry, moving beyond simple chat interfaces to systems that perform tasks and automate workflows. * Dr. Swami Sivasubramanian highlighted a paradigm shift where natural language serves as the primary interface for describing complex goals. * These agents are designed to autonomously generate plans, write necessary code, and call various tools to execute complete solutions without constant human intervention. * AWS is prioritizing the development of production-ready infrastructure that is secure and scalable specifically to handle the "non-deterministic" behavior of these AI agents. ## Core Infrastructure and the Developer Renaissance * Despite the focus on AI, AWS reaffirmed that its core mission remains the "freedom to invent," keeping developers central to its 20-year strategy. * Leaders Peter DeSantis and Dave Brown reinforced that foundational attributes—security, availability, and performance—remain the non-negotiable pillars of the AWS cloud. * The integration of AI agents is framed as a way to finally realize material business returns on AI investments by moving from experimental use cases to automated business logic. To maximize the value of these updates, organizations should begin evaluating how to transition from simple LLM implementations to agentic frameworks that can execute end-to-end business processes. Reviewing the on-demand keynote sessions from re:Invent 2025 is recommended for technical teams looking to implement the latest secure, agent-ready infrastructure.

aws

Amazon Bedrock adds reinforcement fine-tuning simplifying how developers build smarter, more accurate AI models | AWS News Blog (opens in new tab)

Amazon Bedrock has introduced reinforcement fine-tuning, a new model customization capability that allows developers to build more accurate and cost-effective AI models using feedback-driven training. By moving away from the requirement for massive labeled datasets in favor of reward signals, the platform enables average accuracy gains of 66% while automating the complex infrastructure typically associated with advanced machine learning. This approach allows organizations to optimize smaller, faster models for specific business needs without sacrificing performance or incurring the high costs of larger model variants. **Challenges of Traditional Model Customization** * Traditional fine-tuning often requires massive, high-quality labeled datasets and expensive human annotation, which can be a significant barrier for many organizations. * Developers previously had to choose between settle for generic "out-of-the-box" results or managing the high costs and complexity of large-scale infrastructure. * The high barrier to entry for advanced reinforcement learning techniques often required specialized ML expertise that many development teams lack. **Mechanics of Reinforcement Fine-Tuning** * The system uses an iterative feedback loop where models improve based on reward signals that judge the quality of responses against specific business requirements. * Reinforcement Learning with Verifiable Rewards (RLVR) utilizes rule-based graders to provide objective feedback for tasks such as mathematics or code generation. * Reinforcement Learning from AI Feedback (RLAIF) uses AI-driven evaluations to help models understand preference and quality without manual human intervention. * The workflow can be powered by existing API logs within Amazon Bedrock or by uploading training datasets, eliminating the need for complex infrastructure setup. **Performance and Security Advantages** * The technique achieves an average accuracy improvement of 66% over base models, enabling smaller models to perform at the level of much larger alternatives. * Current support includes the Amazon Nova 2 Lite model, which helps developers optimize for both speed and price-to-performance. * All training data and customization processes remain within the secure AWS environment, ensuring that proprietary data is protected and compliant with organizational security standards. Developers should consider reinforcement fine-tuning as a primary strategy for optimizing smaller models like Amazon Nova 2 Lite to achieve high-tier performance at a lower cost. This capability is particularly recommended for specialized tasks like reasoning and coding where objective reward functions can be used to rapidly iterate and improve model accuracy.

aws

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning | AWS News Blog (opens in new tab)

Amazon SageMaker AI has introduced a new serverless customization capability designed to accelerate the fine-tuning of popular models like Llama, DeepSeek, and Amazon Nova. By automating resource provisioning and providing an intuitive interface for advanced reinforcement learning techniques, this feature reduces the model customization lifecycle from months to days. This end-to-end workflow allows developers to focus on model performance rather than infrastructure management, from initial training through to final deployment. **Automated Infrastructure and Model Support** * The service provides a serverless environment where SageMaker AI automatically selects and provisions compute resources based on the specific model architecture and dataset size. * Supported models include a broad range of high-performance options such as Amazon Nova, DeepSeek, GPT-OSS, Meta Llama, and Qwen. * The feature is accessible directly through the Amazon SageMaker Studio interface, allowing users to manage their entire model catalog in one location. **Advanced Customization and Reinforcement Learning** * Users can choose from several fine-tuning techniques, including traditional Supervised Fine-Tuning (SFT) and more advanced methods. * The platform supports modern optimization techniques such as Direct Preference Optimization (DPO), Reinforcement Learning from Verifiable Rewards (RLVR), and Reinforcement Learning from AI Feedback (RLAIF). * To simplify the process, SageMaker AI provides recommended defaults for hyperparameters like batch size, learning rate, and epochs based on the selected tuning technique. **Experiment Tracking and Security** * The workflow introduces a serverless MLflow application, enabling seamless experiment tracking and performance monitoring without additional setup. * Advanced configuration options allow for fine-grained control over network encryption and storage volume encryption to ensure data security. * The "Continue customization" feature allows for iterative tuning, where users can adjust hyperparameters or apply different techniques to an existing customized model. **Evaluation and Deployment Flexibility** * Built-in evaluation tools allow developers to compare the performance of their customized models against the original base models to verify improvements. * Once a model is finalized, it can be deployed with a few clicks to either Amazon SageMaker or Amazon Bedrock. * A centralized "My Models" dashboard tracks all custom iterations, providing detailed logs and status updates for every training and evaluation job. This serverless approach is highly recommended for teams that need to adapt large language models to specific domains quickly without the operational overhead of managing GPU clusters. By utilizing the integrated evaluation and multi-platform deployment options, organizations can transition from experimentation to production-ready AI more efficiently.

aws

Introducing checkpointless and elastic training on Amazon SageMaker HyperPod | AWS News Blog (opens in new tab)

Amazon SageMaker HyperPod has introduced checkpointless and elastic training features to accelerate AI model development by minimizing infrastructure-related downtime. These advancements replace traditional, slow checkpoint-restart cycles with peer-to-peer state recovery and enable training workloads to scale dynamically based on available compute capacity. By decoupling training progress from static hardware configurations, organizations can significantly reduce model time-to-market while maximizing cluster utilization. **Checkpointless Training and Rapid State Recovery** * Replaces the traditional five-stage recovery process—including job termination, network setup, and checkpoint retrieval—which can often take up to an hour on self-managed clusters. * Utilizes peer-to-peer state replication and in-process recovery to allow healthy nodes to restore the model state instantly without restarting the entire job. * Incorporates technical optimizations such as collective communications initialization and memory-mapped data loading to enable efficient data caching. * Reduces recovery downtime by over 80% based on internal studies of clusters with up to 2,000 GPUs, and was a core technology used in the development of Amazon Nova models. **Elastic Training and Automated Cluster Scaling** * Allows AI workloads to automatically expand to use idle cluster capacity as it becomes available and contract when resources are needed for higher-priority tasks. * Reduces the need for manual intervention, saving hours of engineering time previously spent reconfiguring training jobs to match fluctuating compute availability. * Optimizes total cost of ownership by ensuring that training momentum continues even as inference volumes peak and pull resources away from the training pool. * Orchestrates these transitions seamlessly through the HyperPod training operator, ensuring that model development is not disrupted by infrastructure changes. For teams managing large-scale AI workloads, adopting these features can reclaim significant development time and lower operational costs by preventing idle cluster periods. Organizations scaling to thousands of accelerators should prioritize checkpointless training to mitigate the impact of hardware faults and maintain continuous training momentum.

google

From Waveforms to Wisdom: The New Benchmark for Auditory Intelligence (opens in new tab)

Google Research has introduced the Massive Sound Embedding Benchmark (MSEB) to unify the fragmented landscape of machine sound intelligence. By standardizing the evaluation of eight core auditory capabilities across diverse datasets, the framework reveals that current sound representations are far from universal and have significant performance "headroom" for improvement. Ultimately, MSEB provides an open-source platform to drive the development of general-purpose sound embeddings for next-generation multimodal AI. ### Diverse Datasets for Real-World Scenarios The benchmark utilizes a curated collection of high-quality, accessible datasets designed to reflect global diversity and complex acoustic environments. * **Simple Voice Questions (SVQ):** A foundational dataset featuring 177,352 short spoken queries across 17 languages and 26 locales, recorded in varying conditions like traffic and media noise. * **Speech-MASSIVE:** Used for multilingual spoken language understanding and intent classification. * **FSD50K:** A large-scale dataset for environmental sound event recognition containing 200 classes based on the AudioSet Ontology. * **BirdSet:** A massive-scale benchmark specifically for avian bioacoustics and complex soundscape recordings. ### Eight Core Auditory Capabilities MSEB is structured around "super-tasks" that represent the essential functions an intelligent auditory system must perform within a multimodal context. * **Retrieval and Reasoning:** These tasks simulate voice search and the ability of an assistant to find precise answers within documents based on spoken questions. * **Classification and Transcription:** Standard perception tasks that categorize sounds by environment or intent and convert audio signals into verbatim text. * **Segmentation and Clustering:** These involve identifying and localizing salient terms with precise timestamps and grouping sound samples by shared attributes without predefined labels. * **Reranking and Reconstruction:** Advanced tasks that reorder ambiguous text hypotheses to match spoken queries and test embedding quality by regenerating original audio waveforms. ### Unified Evaluation and Performance Goals The framework is designed to move beyond fragmented research by providing a consistent structure for evaluating different model architectures. * **Model Agnostic:** The open framework allows for the evaluation of uni-modal, cascade, and end-to-end multimodal embedding models. * **Objective Baselines:** By establishing clear performance goals, the benchmark highlights specific research opportunities where current state-of-the-art models fall short of their potential. * **Multimodal Integration:** Every task assumes sound is the critical input but incorporates other modalities, such as text context, to better simulate real-world AI interactions. By providing a comprehensive roadmap for auditory intelligence, MSEB encourages the community to move toward universal sound embeddings. Researchers can contribute to this evolving standard by accessing the open-source GitHub repository and utilizing the newly released datasets on Hugging Face to benchmark their own models.

toss

Toss Next ML Challenge - Ad (opens in new tab)

Toss recently hosted the "Toss Next ML Challenge," a large-scale competition focused on predicting advertisement Click-Through Rates (CTR) using real-world, anonymized data from the Toss app. By tasking over 2,600 participants with developing high-performance models under real-time serving constraints, the event successfully identified innovative technical approaches to feature engineering and model ensembling. ### Designing a Real-World CTR Prediction Task * The competition required participants to predict the probability of a user clicking a display ad based on a dataset of 10.7 million training samples. * Data included anonymized features such as age, gender, ad inventory IDs, and historical user behavior. * A primary technical requirement was "real-time navigability," meaning models had to be optimized for fast inference to function within a live service environment. ### Overcoming Anonymization with Sequence Engineering * To maintain data privacy while allowing external access, Toss provided anonymized features in a single flattened table, which limited the ability of participants to perform traditional data joins. * A complex, raw "Sequence" feature was intentionally left unprocessed to serve as a differentiator for high-performing teams. * Top-tier participants demonstrated extreme persistence by deriving up to 37 unique variables from this single sequence, including transition probabilities, unique token counts, and sequence lengths. ### Winning Strategies and Technical Trends * All of the top 30 teams utilized Boosting Tree-based models (such as XGBoost or LightGBM), while Deep Learning was used only by a subset of participants. * One standout solution utilized a massive ensemble of 260 different models, providing a fresh perspective on the limits of ensemble learning for predictive accuracy. * Performance was largely driven by the ability to extract meaningful signals from anonymized data through rigorous cross-validation and creative feature interactions. The results of the Toss Next ML Challenge suggest that even in the absence of domain-specific context due to anonymization, meticulous feature engineering and robust tree-based architectures remains the gold standard for tabular data. For ML engineers, the competition underscores that the key to production-ready models lies in balancing complex ensembling with the strict latency requirements of real-time serving.

naver

[DAN25] (opens in new tab)

Naver recently released the full video archives from its DAN25 conference, highlighting the company’s strategic roadmap for AI agents, Sovereign AI, and digital transformation. The sessions showcase how Naver is moving beyond general AI applications to implement specialized, real-time systems that integrate large language models (LLMs) directly into core services like search, commerce, and content. By open-sourcing these technical insights, Naver demonstrates its progress in building a cohesive AI ecosystem capable of handling massive scale and complex user intent. ### Naver PersonA and LLM-Based User Memory * The "PersonA" project focuses on building a "user memory" by treating fragmented logs across various Naver services as indirect conversations with the user. * By leveraging LLM reasoning, the system transitions from simple data tracking to a sophisticated AI agent that offers context-aware, real-time suggestions. * Technical hurdles addressed include the stable implementation of real-time log reflection for a massive user base and the selection of optimal LLM architectures for personalized inference. ### Trend Analysis and Search-Optimized Models * The Place Trend Analysis system utilizes ranking algorithms to distinguish between temporary surges and sustained popularity, providing a balanced view of "hot places." * LLMs and text mining are employed to move beyond raw data, extracting specific keywords that explain the underlying reasons for a location's trending status. * To improve search quality, Naver developed search-specific LLMs that outperform general models by using specialized data "recipes" and integrating traditional information retrieval with features like "AI briefing" and "AuthGR" for higher reliability. ### Unified Recommendation and Real-Time CRM * Naver Webtoon and Series replaced fragmented recommendation and CRM (Customer Relationship Management) models with a single, unified framework to ensure data consistency. * The architecture shifted from batch-based processing to a real-time, API-based serving system to reduce management complexity and improve the immediacy of personalized user experiences. * This transition focuses on maintaining a seamless UX by synchronizing different ML models under a unified serving logic. ### Scalable Log Pipelines and Infrastructure Stability * The "Logiss" pipeline manages up to tens of billions of logs daily, utilizing a Storm and Kafka environment to ensure high availability and performance. * Engineers implemented a multi-topology approach to allow for seamless, non-disruptive deployments even under heavy loads. * Intelligent features such as "peak-shaving" (distributing peak traffic to off-peak hours), priority-based processing during failures, and efficient data sampling help balance cost, performance, and stability. These sessions provide a practical blueprint for organizations aiming to scale LLM-driven services while maintaining infrastructure integrity. For developers and system architects, Naver’s transition toward unified ML frameworks and specialized, real-time data pipelines offers a proven model for moving AI from experimental phases into high-traffic production environments.

google

Reducing EV range anxiety: How a simple AI model predicts port availability (opens in new tab)

Google Research has developed a lightweight AI model designed to predict the probability of EV charging port availability at specific future intervals, directly addressing the "range anxiety" experienced by electric vehicle drivers. By co-designing the model with deployment infrastructure, researchers found that a simple linear regression approach outperformed more complex architectures like neural networks and decision trees. The resulting system effectively predicts availability changes during high-turnover periods, providing more reliable navigation and planning data than traditional "no-change" assumptions. ### Model Architecture and Feature Selection * The development team prioritized a minimal feature set to ensure low-latency deployment and high speed in real-world navigational applications. * After testing various architectures, a straightforward linear regression model was selected for its robustness and superior performance in this specific predictive task. * The model was trained using real-time availability data from diverse geographical regions, specifically California and Germany, with an emphasis on larger charging stations that reflect high-traffic usage patterns. ### Temporal Feature Weights and Occupancy Trends * The model uses the hour of the day as a primary feature, treating each hour as an independent variable to capture specific daily cycles. * Learned numerical "weights" dictate the predicted rate of occupancy change: positive weights indicate ports are becoming occupied (e.g., during morning rush), while negative weights indicate ports are being freed up (e.g., during evening hours). * The system is designed to only deviate from the current occupancy state when the change rate is statistically significant or when a station's large size amplifies the likelihood of a status change. ### Performance Benchmarking and Validation * The model was evaluated against a "Keep Current State" baseline, which assumes future availability will be identical to the present status—a difficult baseline to beat since port status remains unchanged roughly 90% of the time over 30-minute windows. * Accuracy was measured using Mean Squared Error (MSE) and Mean Absolute Error (MAE) over 30-minute and 60-minute time horizons across 100 randomly selected stations. * Testing confirmed that the linear regression model provides its greatest value during infrequent but critical moments of high turnover, successfully identifying when a station is likely to become full or available. The success of this model demonstrates that sophisticated deep learning is not always the optimal solution for infrastructure challenges. By combining intuitive real-world logic—such as driver schedules and station capacity—with simple machine learning techniques, developers can create highly efficient tools that significantly improve the EV user experience without requiring massive computational overhead.