line

Won't you become a hacker? (opens in new tab)

Hack Day 2025 serves as a cornerstone of LY Corporation’s engineering culture, bringing together diverse global teams to innovate beyond their daily operational scopes. By fostering a high-intensity environment focused on creative freedom, the event facilitates technical growth and strengthens interpersonal bonds across international branches. This 19th edition demonstrated how rapid prototyping and cross-functional collaboration can transform abstract ideas into functional AI-driven prototypes within a strict 24-hour window. ### Structure and Participation Dynamics * The hackathon follows a "9 to 9" format, providing exactly 24 hours of development time followed by a day for presentations and awards. * Participation is inclusive of all roles, including developers, designers, planners, and HR staff, allowing for holistic product development. * Teams can be "General Teams" from the same legal entity or "Global Mixed Teams" comprising members from different regions like Korea, Japan, Taiwan, and Vietnam. * The Developer Relations (DevRel) team facilitates team building for remote employees using digital collaboration tools like Zoom and Miro. ### AI-Powered Personality Analysis Project * The author's team developed a "Scouter" program inspired by Dragon Ball, designed to measure professional "combat power" based on communication history. * The system utilizes Slack bots and AI models to analyze message logs and map them to the Big 5 Personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism). * Professional metrics are visualized as game-like character statistics to make personality insights engaging and less intimidating. * While the original plan involved using AI to generate and print physical character cards, hardware failures with photo printers forced a technical pivot to digital file downloads. ### High-Pressure Presentation and Networking * Every team is allotted a strict 90-second window to pitch their product and demonstrate a live demo. * The "90-second rule" includes a mandatory microphone cutoff to maintain momentum and keep the large-scale event engaging for all attendees. * Dedicated booth sessions follow the presentations, allowing participants to provide hands-on experiences to colleagues and judges. * The event emphasizes "Perfect the Details," a core company value, by encouraging teams to utilize all available resources—from whiteboards to AI image generators—within the time limit. ### Environmental Support and Culture * The event occupies an entire office floor, providing a high-density yet comfortable environment designed to minimize distractions during the "Hack Time." * Cultural exchange is encouraged through "humanity snacks," where participants from different global offices share local treats in dedicated rest areas. * Strategic scheduling, such as "Travel Days" for international participants, ensures that teams can focus entirely on technical execution once the event begins. Participating in internal hackathons provides a vital platform for testing new technologies—like LLMs and personality modeling—that may not fit into immediate product roadmaps. For organizations with hybrid work models, these intensive in-person events are highly recommended to bridge the communication gap and build lasting trust between global teammates.

line

Flexible Multi-site Architecture Designed (opens in new tab)

LINE NEXT optimized its web server infrastructure by transitioning from fragmented, manual Nginx setups to a centralized native Nginx multi-site architecture. By integrating global configurations and automating the deployment pipeline with Ansible, the team successfully reduced service launch lead times by over 80% while regaining the ability to use advanced features like GeoIP and real client IP tracking. This evolution ensures that the infrastructure can scale to support over 100 subdomains across diverse global services with high reliability and minimal manual overhead. ## Evolution of Nginx Infrastructure * **PMC-based Structure**: The initial phase relied on a Project Management Console using `rsync` via SSH; this created security risks and led to fragmented, siloed configurations that were difficult to maintain. * **Ingress Nginx Structure**: To improve speed, the team moved to Kubernetes-based Ingress using Helm charts, which automated domain and certificate settings but limited the use of native Nginx modules and complicated the retrieval of real client IP addresses. * **Native Nginx Multi-site Structure**: The current hybrid approach utilizes native Nginx managed by Ansible, combining the speed of configuration-driven setups with the flexibility to use advanced modules like GeoIP and Loki for log collection. ## Configuration Integration and Multi-site Management * **Master Configuration Extraction**: Common directives such as `timeouts`, `keep-alive` settings, and `log formats` were extracted into a master Nginx configuration file to eliminate redundancy across services. * **Hierarchical Directory Structure**: Inspired by Apache, the team adopted a `sites-available` structure where individual `server` blocks for different services (alpha, beta, production) are managed in separate files. * **Operational Efficiency**: This integrated structure allows a single Nginx instance to serve multiple sites simultaneously, significantly reducing the time required to add and deploy new service domains. ## Automated Deployment with Ansible * **Standardized Workflow**: The team replaced manual processes with Ansible playbooks that handle everything from cloning the latest configuration from Git to extracting environment-specific files. * **Safety and Validation**: The automated pipeline includes mandatory Nginx syntax verification (`nginx -t`) and process status checks to ensure stability before a deployment is finalized. * **Rolling Deployments**: To minimize service impact, updates are pushed sequentially across servers; the process automatically halts if an error is detected at any stage of the rollout. To effectively manage a rapidly expanding portfolio of global services, infrastructure teams should move toward a "configuration-as-code" model that separates common master settings from service-specific logic. Leveraging automation tools like Ansible alongside a native Nginx multi-site structure provides the necessary balance between rapid deployment and the granular control required for complex logging and security requirements.

google

Achieving 10,000x training data reduction with high-fidelity labels (opens in new tab)

Google Ads researchers have developed a scalable active learning curation process that reduces the volume of training data required for fine-tuning LLMs by up to four orders of magnitude. By iteratively identifying the most informative and diverse examples through clustering and expert review, the method achieves significantly higher human-model alignment than traditional large-scale crowdsourced datasets. This approach effectively addresses the high costs and complexities of classifying ambiguous content, such as unsafe ads, where high-fidelity data is scarce and concept drift is frequent. ### The Iterative Curation Process * **Initial Labeling:** The process begins with a zero- or few-shot model (LLM-0) that generates a large, typically imbalanced dataset of "positive" and "benign" labels. * **Clustering and Confusion Identification:** Separate clusters are created for each label set; overlapping clusters indicate areas where the model is confused. * **Expert Sampling:** Human experts review pairs of examples located near the decision boundary of these overlapping clusters, prioritizing those that cover a larger area of the search space to ensure diversity. * **Recursive Refinement:** Expert labels are split into fine-tuning and evaluation sets; the model is retrained and the process repeats until model-human alignment plateaus or matches internal expert agreement. ### Measuring Alignment via Cohen’s Kappa * **Metric Selection:** Because ad safety is often subjective, the researchers use Cohen’s Kappa instead of precision and recall to measure how well two independent annotators align beyond chance. * **Performance Benchmarks:** A Kappa value above 0.8 is considered exceptional, while 0.4 is the minimum for acceptability. * **Goal Alignment:** The curation process aims to move model performance toward the "ceiling" of internal human agreement (which measured between 0.78 and 0.81 in these experiments). ### Experimental Results and Efficiency * **Model Scaling:** Experiments involved fine-tuning Gemini Nano-1 (1.8B parameters) and Nano-2 (3.25B parameters) on tasks of varying complexity. * **Drastic Data Reduction:** The curated method reached performance plateaus using fewer than 500 expert-labeled examples, compared to a baseline of 100,000 crowdsourced labels. * **Quality Gains:** Despite using 10,000x less data, the curated models saw up to a 65% improvement in alignment with human experts over the crowdsourced baselines. * **Class Balancing:** The process naturally corrected for production imbalances, moving from <1% positive examples in raw traffic to ~40% in the final curated sets. This curation method is a highly effective strategy for organizations managing high-stakes classification tasks where "ground truth" is subjective or data curation is prohibitively expensive. By shifting focus from data quantity to the quality and diversity of examples at the decision boundary, developers can maintain high-performing models that adapt quickly to evolving safety policies.

google

Insulin resistance prediction from wearables and routine blood biomarkers (opens in new tab)

Researchers at Google have developed a novel machine learning approach to predict insulin resistance (IR) by integrating wearable device data with routine blood biomarkers. This method aims to provide a scalable, less invasive alternative to traditional "gold standard" tests like the euglycemic insulin clamp or specialized HOMA-IR assessments. The study demonstrates that combining digital biomarkers with common laboratory results can effectively identify individuals at risk for type 2 diabetes, particularly within high-risk populations. ## Barriers to Early Diabetes Screening * Insulin resistance is a primary precursor to approximately 70% of type 2 diabetes cases, yet it often remains undetected until the disease has progressed. * Current diagnostic standards are frequently omitted from routine check-ups due to high costs, invasiveness, and the requirement for specific insulin blood tests that are not standard practice. * Early detection is vital because insulin resistance is often reversible through lifestyle modifications, making accessible screening tools a high priority for preventative medicine. ## The WEAR-ME Multimodal Dataset * The research utilized the "WEAR-ME" study, which collected data from 1,165 remote participants across the U.S. via the Google Health Studies app. * Digital biomarkers were gathered from Fitbit and Google Pixel Watch devices, tracking metrics such as resting heart rate, step counts, and sleep patterns. * Clinical data was provided through a partnership with Quest Diagnostics, focusing on routine blood biomarkers like fasting glucose and lipid panels, supplemented by participant surveys on diet, fitness, and demographics. ## Predictive Modeling and Performance * Deep neural network models were trained to estimate HOMA-IR scores by analyzing different combinations of the collected data streams. * While models using only wearables and demographics achieved an area under the receiver operating characteristic curve (auROC) of 0.70, adding fasting glucose data boosted the auROC to 0.78. * The most comprehensive models, which combined wearables, demographics, and full routine blood panels, achieved the highest accuracy across the study population. * Performance was notably strong in high-risk sub-groups, specifically individuals with obesity or sedentary lifestyles. ## AI-Driven Interpretation and Literacy * To assist with data translation, the researchers developed a prototype "Insulin Resistance Literacy and Understanding Agent" built on the Gemini family of large language models. * The agent is designed to help users interpret their IR risk predictions and provide personalized, research-backed educational content. * This AI integration aims to facilitate better communication between the data results and actionable health strategies, though it is currently intended for informational and research purposes. By utilizing ubiquitous wearable technology and existing clinical infrastructure, this approach offers a path toward proactive metabolic health monitoring. Integrating these models into consumer or clinical platforms could lower the barrier to early diabetes intervention and enable more personalized preventative care.

google

Highly accurate genome polishing with DeepPolisher: Enhancing the foundation of genomic research (opens in new tab)

DeepPolisher is a deep learning-based genome assembly tool designed to correct base-level errors with high precision, significantly enhancing the accuracy of genomic research. By leveraging a Transformer architecture to analyze sequencing data, the tool reduces total assembly errors by 50% and insertion or deletion (indel) errors by 70%. This advancement is critical for creating near-perfect reference genomes, such as the Human Pangenome Reference, which are essential for identifying disease-causing variants and understanding human evolution. ## Limitations of Current Sequencing Technologies * Genome assembly relies on reading nucleotides (A, T, G, and C), but the microscopic scale of these base pairs makes accurate, large-scale sequencing difficult. * Short-read sequencing methods provide high signal strength but are limited to a few hundred nucleotides because identical DNA clusters eventually desynchronize, blending signals together. * Long-read technologies can sequence tens of thousands of nucleotides but initially suffered from high error rates (~10%); while tools like DeepConsensus have reduced this to 0.1%, further refinement is necessary for high-fidelity reference genomes. * Even a 0.1% error rate results in millions of inaccuracies across the 3-billion-nucleotide human genome, which can cause researchers to miss critical genetic markers or misidentify proteins. ## DeepPolisher Architecture and Training * DeepPolisher is an open-source pipeline adapted from the DeepConsensus model, utilizing a Transformer-based neural network. * The model was trained using a human cell line from the Personal Genomes Project that is estimated to be 99.99999% accurate, providing a "ground truth" for identifying and correcting errors. * The system takes sequenced bases, their associated quality scores, and the orientation of the DNA strands to learn complex error patterns that traditional methods might miss. * By combining sequence reads from multiple DNA molecules of the same individual, the tool iteratively "polishes" the assembly to reach the accuracy required for reference-grade data. ## Impact on Genomic Accuracy and Gene Discovery * The tool’s ability to reduce indel errors by 70% is particularly significant, as these specific errors often interfere with the identification of protein-coding genes. * DeepPolisher has already been integrated into major research efforts, including the enhancement of the Human Pangenome Reference, providing a more robust foundation for clinical diagnostics. * Improved assembly accuracy allows for better mapping of regions where the genome is highly repetitive, which were previously difficult to sequence and assemble confidently. For researchers and bioinformaticians, DeepPolisher represents a vital step in moving from "draft" genomes to high-fidelity references. Adopting this tool in assembly pipelines can drastically improve the reliability of variant calling and gene annotation, especially in complex clinical and evolutionary studies.

line

Replacing the Payment System DB Handling (opens in new tab)

The LINE Billing Platform successfully migrated its large-scale payment database from Nbase-T to Vitess to handle high-traffic global transactions. While initially exploring gRPC for its performance reputation, the team transitioned to the MySQL protocol to ensure stability and reduce CPU overhead within their Java-based environment. This implementation demonstrates how Vitess can manage complex sharding requirements while maintaining high availability through automated recovery tools. ### Protocol Selection and Implementation - The team initially attempted to use the gRPC protocol but encountered `http2: frame too large` errors and significant CPU overhead during performance testing. - Manual mapping of query results to Java objects proved cumbersome with the Vitess gRPC client, leading to a shift toward the more mature and recommended MySQL protocol. - Using the MySQL protocol allowed the team to leverage standard database drivers while benefiting from Vitess's routing capabilities via VTGate. ### Keyspace Architecture and Data Routing - The system utilizes a dual-keyspace strategy: a "Global Keyspace" for unsharded metadata and a "Service Keyspace" for sharded transaction data. - The Global Keyspace manages sharding keys using a "sequence" table type to ensure unique, auto-incrementing identifiers across the platform. - The Service Keyspace is partitioned into $N$ shards using a hash-based Vindex, which distributes coin balances and transaction history. - VTGate automatically routes queries to the correct shard by analyzing the sharding key in the `WHERE` clause or `INSERT` statement, minimizing cross-shard overhead. ### MySQL Compatibility and Transaction Logic - Vitess maintains `REPEATABLE READ` isolation for single-shard transactions, while multi-shard transactions default to `READ COMMITTED`. - Advanced features like Two-Phase Commit (2PC) are available for handling distributed transactions across multiple shards. - Query execution plans are analyzed using `VEXPLAIN` and `VTEXPLAIN`, often managed through the VTAdmin web interface for better visibility. - Certain limitations apply, such as temporary tables only being supported in unsharded keyspaces and specific unsupported SQL cases documented in the Vitess core. ### Automated Operations and Monitoring - The team employs VTOrc (based on Orchestrator) to automatically detect and repair database failures, such as unreachable primaries or replication stops. - Monitoring is centralized via Prometheus, which scrapes metrics from VTOrc, VTGate, and VTTablet components at dedicated ports (e.g., 16000). - Real-time alerts are routed through Slack and email, using `tablet_alias` to specifically identify which MySQL node or VTTablet is experiencing issues. - A web-based recovery dashboard provides a history of automated fixes, allowing operators to track the health of the cluster over time. For organizations migrating high-traffic legacy systems to a cloud-native sharding solution, prioritizing the MySQL protocol over gRPC is recommended for better compatibility with existing application frameworks and reduced operational complexity.

google

MLE-STAR: A state-of-the-art machine learning engineering agent (opens in new tab)

MLE-STAR is a state-of-the-art machine learning engineering agent designed to automate complex ML tasks by treating them as iterative code optimization challenges. Unlike previous agents that rely solely on an LLM’s internal knowledge, MLE-STAR integrates external web searches and targeted ablation studies to pinpoint and refine specific pipeline components. This approach allows the agent to achieve high-performance results, evidenced by its ability to win medals in 63% of Kaggle competitions within the MLE-Bench-Lite benchmark. ## External Knowledge and Targeted Ablation The core of MLE-STAR’s effectiveness lies in its ability to move beyond generic machine learning libraries by incorporating external research and specific performance testing. * The agent uses web search to retrieve task-specific, state-of-the-art models and approaches rather than defaulting to familiar libraries like scikit-learn. * Instead of modifying an entire script at once, the system conducts an ablation study to evaluate the impact of individual pipeline components, such as feature engineering or model selection. * By identifying which code blocks have the most significant impact on performance, the agent can focus its reasoning and optimization efforts where they are most needed. ## Iterative Refinement and Intelligent Ensembling Once the critical components are identified, MLE-STAR employs a specialized refinement process to maximize the effectiveness of the generated solution. * Targeted code blocks undergo iterative refinement based on LLM-suggested plans that incorporate feedback from prior experimental failures and successes. * The agent features a unique ensembling strategy where it proposes multiple candidate solutions and then designs its own method to merge them. * Rather than using simple validation-score voting, the agent iteratively improves the ensemble strategy itself, treating the combination of models as a distinct optimization task. ## Robustness and Safety Verification To ensure the generated code is both functional and reliable for real-world deployment, MLE-STAR incorporates three specialized diagnostic modules. * **Debugging Agent:** Automatically analyzes tracebacks and execution errors in Python scripts to provide iterative corrections. * **Data Leakage Checker:** Reviews the solution script prior to execution to ensure the model does not improperly access test dataset information during the training phase. * **Data Usage Checker:** Analyzes whether the script is utilizing all available data sources, preventing the agent from overlooking complex data formats in favor of simpler files like CSVs. By combining external grounding with a granular, component-based optimization strategy, MLE-STAR represents a significant shift in automated machine learning. For organizations looking to scale their ML workflows, such an agent suggests a future where the role of the engineer shifts from manual coding to high-level supervision of autonomous agents that can navigate the vast landscape of research and data engineering.

google

Simulating large systems with Regression Language Models (opens in new tab)

Researchers from Google have introduced Regression Language Models (RLMs) as a universal solution for numeric prediction tasks by framing regression as a text-to-text problem. By converting complex, unstructured system data into strings, RLMs can predict performance metrics without the need for manual feature engineering or data normalization. This approach allows large language models to move beyond subjective human feedback and directly model raw operational data for large-scale software and industrial infrastructures. ## Conceptualizing Text-to-Text Regression * Traditional regression methods rely on tabular data—fixed-length numeric vectors—which are difficult and laborious to maintain for evolving systems like software logs or hardware patterns. * RLMs represent the input state ($x$) as a structured text string (such as JSON or YAML) and the numerical output ($y$) as a text string. * The model is trained using standard next-token prediction and cross-entropy loss, allowing it to function as a universal approximator for complex data types. * This paradigm eliminates the need for manual feature engineering, as the model learns directly from the raw textual representation of the system state. ## Architecture and Training for Large Systems * The research utilizes a compact RLM consisting of a two-layer encoder-decoder architecture with 60 million parameters. * To manage large inputs that can reach up to 1 million tokens, the system reorders features by importance at the beginning of the string so that critical data is preserved when truncated to the model's 8k token limit. * Pre-training the RLM on diverse regression tasks enables few-shot adaptation, allowing the model to adjust to new data types with minimal gradient updates. * Numerical values are processed as-is within the text, removing the requirement for traditional scaling or normalization common in standard machine learning pipelines. ## Optimizing Google's Borg Infrastructure * The method was specifically applied to Google’s Borg system to predict MIPS per GCU (Millions of Instructions Per Second per Google Compute Unit), a vital efficiency metric. * The RLM simulates the outcomes of complex bin-packing algorithms within a "digital twin" framework to optimize resource allocation across CPUs and TPUs. * By analyzing execution traces and textual metadata, the model provides high-accuracy forecasting for diverse workloads including Gmail, YouTube, and Maps. ## Density Capture and Uncertainty Modeling * Unlike traditional regressors that provide a single point estimate, RLMs can capture full probability distributions by sampling the decoded output multiple times. * This density estimation is critical for modeling aleatoric uncertainty, which represents the inherent randomness and stochastic load demands of large-scale compute environments. * The ability to visualize these distributions helps engineers identify the range of possible outcomes and the inherent variability of the system's performance over time. This research demonstrates that small, specialized language models can effectively replace traditional regression methods in highly dynamic environments. For practitioners looking to implement these capabilities, the open-source `regress-lm` library provides a framework for simulating large systems and predicting performance across varied industrial and scientific use cases.

line

Introducing a case study of (opens in new tab)

LY Corporation’s ABC Studio developed a specialized retail Merchant system by leveraging Domain-Driven Design (DDD) to overcome the functional limitations of a legacy food-delivery infrastructure. The project demonstrates that the primary value of DDD lies not just in technical implementation, but in aligning organizational structures and team responsibilities with domain boundaries. By focusing on the roles and responsibilities of the system rather than just the code, the team created a scalable platform capable of supporting diverse consumer interfaces. ### Redefining the Retail Domain * The legacy system treated retail items like restaurant entries, creating friction for specialized retail services; the new system was built to be a standalone platform. * The team narrowed the domain focus to five core areas: Shop, Item, Category, Inventory, and Order. * Sales-specific logic, such as coupons and promotions, was delegated to external "Consumer Platforms," allowing the Merchant system to serve as a high-performance information provider. ### Clean Architecture and Modular Composition * The system utilizes Clean Architecture to ensure domain entities remain independent of external frameworks, which also provided a manageable learning curve for new team members. * Services are split into two distinct modules: "API" modules for receiving external requests and "Engine" modules for processing business logic. * Communication between these modules is handled asynchronously via gRPC and Apache Kafka, using the Decaton library to increase throughput while maintaining a low partition count. * The architecture prioritizes eventual consistency, allowing for high responsiveness and scalability across the platform. ### Global Collaboration and Conway’s Law * Development was split between teams in Korea (Core Domain) and Japan (System Integration and BFF), requiring a shared understanding of domain boundaries. * Architectural Decision Records (ADR) were implemented to document critical decisions and prevent "knowledge drift" during long-term collaboration. * The organizational structure was intentionally designed to mirror the system architecture, with specific teams (Core, Link, BFF, and Merchant Link) assigned to distinct domain layers. * This alignment, reflecting Conway’s Law, ensures that changes to external consumer platforms have minimal impact on the stable core domain logic. Successful DDD adoption requires moving beyond technical patterns like hexagonal architecture and focusing on establishing a shared understanding of roles across the organization. By structuring teams to match domain boundaries, companies can build resilient systems where the core business logic remains protected even as the external service ecosystem evolves.

google

SensorLM: Learning the language of wearable sensors (opens in new tab)

SensorLM is a new family of foundation models designed to bridge the gap between high-dimensional wearable sensor data and natural language descriptions. By training on a massive dataset of nearly 60 million hours of de-identified health data, the models learn to interpret complex physiological signals to provide meaningful context for human activities. This research demonstrates that integrating multimodal sensor signals with language models enables sophisticated health insights, such as zero-shot activity recognition and automated health captioning, that significantly outperform general-purpose large language models. ## Dataset Scale and Automated Annotation * The models were pre-trained on an unprecedented 59.7 million hours of multimodal sensor data collected from over 103,000 individuals across 127 countries. * To overcome the high cost of manual annotation, researchers developed a hierarchical pipeline that automatically generates text descriptions by calculating statistics and identifying trends within the raw sensor streams. * Data was sourced from Fitbit and Pixel Watch devices, representing nearly 2.5 million person-days of activity and health information. ## Hybrid Training Architecture * SensorLM unifies two primary multimodal strategies: contrastive learning and generative pre-training. * Through contrastive learning, the model learns to discriminate between different states—such as a "light swim" versus a "strength workout"—by matching sensor segments to corresponding text descriptions. * The generative component allows the model to "speak" for the sensors, producing nuanced, context-aware natural language captions directly from high-dimensional biometric signals. ## Activity Recognition and Cross-Modal Capabilities * The model demonstrates state-of-the-art performance in zero-shot human activity recognition, accurately classifying 20 different activities without any specific fine-tuning. * Its few-shot learning capabilities allow the model to adapt to new tasks or individual user patterns with only a handful of examples. * SensorLM facilitates cross-modal retrieval, enabling users or experts to find specific sensor patterns using natural language queries or to generate descriptions based on specific sensor inputs. ## Generative Health Captioning * Beyond simple classification, the model can generate hierarchical captions that describe the statistical, structural, and semantic dimensions of a user’s data. * Experimental results using metrics like BERTScore show that SensorLM produces captions that are more factually correct and coherent than those created by powerful non-specialist LLMs. * This capability allows for the translation of abstract data points, such as heart rate variability or step counts, into readable summaries that explain the "why" behind physiological changes. By providing a framework where wearable data can be understood through the lens of human language, SensorLM paves the way for more intuitive and personalized health monitoring. This technology holds the potential to transform raw biometric streams into actionable insights, helping users better understand the relationship between their activities and their overall physical well-being.

google

Synthetic and federated: Privacy-preserving domain adaptation with LLMs for mobile applications (opens in new tab)

Researchers at Google have developed a framework for improving both small and large language models (LMs) in mobile applications like Gboard by utilizing privacy-preserving synthetic data and federated learning. This approach combines differential privacy (DP) with large language model (LLM) generation to minimize data memorization risks while achieving significant gains in production metrics like next-word prediction and proofreading. The result is a robust pipeline that allows models to adapt to specific user domains without compromising individual privacy or requiring centralized data storage. ### Strengthening Privacy with DP-FL * Gboard has transitioned all production LMs trained on user data to a Federated Learning with Differential Privacy (DP-FL) framework, ensuring data remains on-device and is never memorized. * The deployment utilizes the **BLT-DP-FTRL** algorithm, which offers an optimized trade-off between privacy guarantees and model utility while being easier to deploy in production. * Engineers adopted the **SI-CIFG** model architecture to facilitate efficient on-device training, ensuring the hardware can handle local updates while maintaining compatibility with DP constraints. ### Synthetic Data Generation via Public LLMs * Powerful LLMs trained on public web data are prompted to synthesize high-quality text that mimics mobile user interactions without ever accessing actual private user data. * The process involves a two-step prompting strategy: first, filtering public datasets to identify topics common in mobile communication, and second, generating new, domain-specific text based on those patterns. * This synthetic data serves as a bridge for pre-training small LMs, which are then refined through private post-training on-device to capture the nuances of user behavior. ### Adapting LLMs for Mobile Proofreading * To support advanced features like Gboard's "Proofread," researchers developed a "Synthesize-then-Adapt" pipeline specifically for error correction. * LLMs generate synthetic "corrupted" text to simulate common mobile typing errors, providing the necessary training pairs (error/correction) that are difficult to find in public datasets. * Federated learning is then used to adapt these error-correction models to specific app domains (such as messaging or email) using on-device signals, ensuring the model understands the specific context of the user's typing. The success of these techniques in Gboard demonstrates that synthetic data can effectively replace or augment private data throughout the machine learning lifecycle. For developers working with sensitive user information, adopting a "synthetic-first" approach combined with federated learning provides a scalable path to model improvement that adheres to the core principles of data minimization and anonymization.

line

Milvus: Building a (opens in new tab)

LINE VOOM transitioned its recommendation system from a batch-based offline process to a real-time infrastructure to solve critical content freshness issues. By adopting Milvus, an open-source vector database, the team enabled the immediate indexing and searching of new video content as soon as it is uploaded. This implementation ensures that time-sensitive posts are recommended to users without the previous 24-hour delay, significantly enhancing user engagement. ### Limitations of the Legacy Recommendation System * The original system relied on daily offline batch processing for embedding generation and similarity searches. * New content, such as holiday greetings or trending sports clips, suffered from a "lack of immediacy," often taking up to a full day to appear in user feeds. * To improve user experience, the team needed to shift from offline candidate pools to an online system capable of real-time Approximate Nearest Neighbor (ANN) searches. ### Selecting Milvus as the Vector Database * The team evaluated Milvus and Qdrant based on performance, open-source status, and on-premise compatibility. * Milvus was selected due to its superior performance, handling 2,406 requests per second compared to Qdrant's 326, with lower query latency (1ms vs 4ms). * Key architectural advantages of Milvus included the separation of storage and computing, support for both stream and batch inserts, and a diverse range of supported in-memory index types. ### Reliability Verification via Chaos Testing * Given the complexity of Milvus clusters, the team performed chaos testing by intentionally injecting failures like pod kills and scaling events. * Tests revealed critical vulnerabilities: killing the `Querycoord` led to collection release and search failure, while losing the `Etcd` quorum caused total metadata loss. * These findings highlighted the need for robust high-availability (HA) configurations to prevent service interruptions during component failures. ### High Availability (HA) Implementation Strategies * **Collection-Level HA:** To prevent search failures during coordinator issues, the team implemented a dual-writing system where embeddings are recorded in two separate collections simultaneously. * **Alias Switching:** Client applications use an "alias" to reference collections; if the primary collection becomes unavailable, the system instantly switches the alias to the backup collection to minimize downtime. * **Coordinator-Level HA:** To eliminate single points of failure, coordinators (such as `Indexcoord`) were configured in an Active-Standby mode, ensuring a backup is always ready to take over management tasks. To successfully deploy a large-scale real-time recommendation engine, it is critical to select a vector database that decouples storage from compute and to implement multi-layered high-availability strategies, such as dual-collection writing and active-standby coordinators, to ensure production stability.

google

LSM-2: Learning from incomplete wearable sensor data (opens in new tab)

LSM-2 introduces a paradigm shift in processing wearable sensor data by treating naturally occurring data gaps as inherent features rather than errors to be corrected. By utilizing the Adaptive and Inherited Masking (AIM) framework, the model learns directly from fragmented, real-world data streams without the need for biased imputation or data-discarding filters. This approach allows LSM-2 to achieve state-of-the-art performance in health-related classification and regression tasks, maintaining robustness even when sensors fail or data is highly interrupted. ## The Challenge of Pervasive Missingness * Real-world wearable data is almost never continuous; factors such as device charging, motion artifacts, and battery-saving modes create frequent "missingness." * Traditional self-supervised learning models require complete data, forcing researchers to use imputation—which can introduce artificial bias—or aggressive filtering that discards over 90% of potentially useful samples. * In a dataset of 1.6 million day-long windows, research found that not a single sample had 0% missingness, highlighting the impracticality of training only on complete datasets. ## Adaptive and Inherited Masking (AIM) * AIM extends the Masked Autoencoder (MAE) framework by treating "inherited" masks (naturally occurring gaps) and "artificial" masks (training objectives) as equivalent. * The framework utilizes a dual masking strategy: it employs token dropout on a fixed ratio of tokens to ensure computational efficiency during encoding. * To handle the unpredictable and variable nature of real-world gaps, AIM uses attention masking within the transformer blocks for any remaining masked tokens. * During evaluation and fine-tuning, the model relies solely on attention masking to navigate naturally occurring gaps, allowing for accurate physiological modeling without filling in missing values. ## Scale and Training Architecture * LSM-2 was trained on a massive dataset comprising 40 million hours of de-identified wearable data from more than 60,000 participants using Fitbit and Google Pixel devices. * The model learns to understand underlying physiological structures by reconstructing masked segments across multimodal inputs, including heart signals, sleep patterns, and activity levels. * Because it is trained on fragmented data, the resulting foundation model is significantly more resilient to sensor dropouts in downstream tasks like hypertension prediction or stress monitoring. LSM-2 demonstrates that foundation models for health should be built to embrace the messiness of real-world environments. By integrating missingness directly into the self-supervised learning objective, developers can bypass the computational and statistical overhead of imputation while building more reliable diagnostic and monitoring tools.

line

Getting 200% (opens in new tab)

Riverpod is a powerful state management library for Flutter designed to overcome the limitations of its predecessor, Provider, by offering a more flexible and robust framework. By decoupling state from the widget tree and providing built-in support for asynchronous data, it significantly reduces boilerplate code and improves application reliability. Ultimately, it allows developers to focus on logic rather than the complexities of manual state synchronization and resource management. ### Modern State Management Architecture Riverpod introduces a streamlined approach to state by separating the logic into Models, Providers, and Views. Unlike the standard `setState` approach, Riverpod manages the lifecycle of state automatically, ensuring resources are allocated and disposed of efficiently. * **Providers as Logic Hubs:** Providers define how state is built and updated, supporting synchronous data, Futures, and Streams. * **Consumer Widgets:** Views use `ref.watch` to subscribe to data and `ref.read` to trigger actions, creating a clear reactive loop. * **Global Access:** Because providers are not tied to the widget hierarchy, they can be accessed from anywhere in the app without passing context through multiple layers. ### Optimization for Server Data and Asynchronous Logic One of Riverpod's strongest advantages is its native handling of server-side data, which typically requires manual logic in other libraries. It simplifies the user experience during network requests by providing built-in states for loading and error handling. * **Resource Cleanup:** Using `ref.onDispose`, developers can automatically cancel active API calls when a provider is no longer needed, preventing memory leaks and unnecessary network usage. * **State Management Utilities:** It natively supports "pull-to-refresh" functionality through `ref.refresh` and allows for custom data expiration settings. * **AsyncValue Integration:** Riverpod wraps asynchronous data in an `AsyncValue` object, making it easy to check if a provider `hasValue`, `hasError`, or `isLoading` directly within the UI. ### Advanced State Interactions and Caching Beyond basic data fetching, Riverpod allows providers to interact with each other to create complex, reactive workflows. This is particularly useful for features like search filters or multi-layered data displays. * **Cross-Provider Subscriptions:** A provider can "watch" another provider; for example, a `PostList` provider can automatically rebuild itself whenever a `Filter` provider's state changes. * **Strategic Caching:** Developers can implement "instant" page transitions by yielding cached data from a list provider to a detail provider immediately, then updating the UI once the full network request completes. * **Offline-First Capabilities:** By combining local database streams with server-side Futures, Riverpod can display local data first to ensure a seamless user experience regardless of network connectivity. ### Seamless Data Synchronization Maintaining consistency across different screens is simplified through Riverpod's centralized state. When a user interacts with a data point on one screen—such as "starring" a post on a detail page—the change can be propagated globally so that the main list view is updated instantly without additional manual refreshes. This synchronization ensures the UI remains a "single source of truth" across the entire application. For developers building data-intensive Flutter applications, Riverpod is a highly recommended choice. Its ability to handle complex asynchronous states and inter-provider dependencies with minimal code makes it an essential tool for creating scalable, maintainable, and high-performance mobile apps.

google

Measuring heart rate with consumer ultra-wideband radar (opens in new tab)

Google Research has demonstrated that ultra-wideband (UWB) radar technology, which is already integrated into many modern smartphones for tasks like precise location and vehicle unlocking, can be repurposed for contactless heart rate monitoring. By employing a transfer learning approach, researchers successfully applied models trained on large datasets from Frequency Modulated Continuous Wave (FMCW) radar to the newer UWB systems. This development suggests that everyday consumer electronics could soon provide accurate vital sign measurements without the need for additional specialized sensors or physical contact. ## Leveraging Existing Consumer Hardware While Google previously used Soli radar (FMCW) for sleep sensing in the Nest Hub, UWB technology represents a more widely available hardware platform in the mobile market. * UWB is currently used primarily for non-radar applications like digital car keys and item tracking (e.g., Apple AirTags). * The technology is increasingly standard in high-end mobile phones, providing a ready-made infrastructure for health sensing. * Utilizing existing UWB chips eliminates the need for manufacturers to add dedicated medical sensors to devices. ## Overcoming Signal Interference in Vital Sensing The primary challenge in radar-based heart rate monitoring is that the micro-movements of the chest wall caused by a heartbeat are significantly smaller than movements caused by breathing or general body shifts. * The system utilizes three-dimensional spatial resolution to create a "measurement zone" focused specifically on the user's torso. * High temporal resolution, sampling at speeds up to 200Hz, allows the radar to capture the rapid, subtle pulses of a heartbeat. * By isolating reflections from the chest area, the radar can ignore stationary background objects and external movements that would otherwise corrupt the data. ## Cross-Radar Transfer Learning Because the researchers possessed extensive datasets for FMCW radar but very limited data for UWB, they developed a method to transfer learned features between different radar types despite their different physical principles. * FMCW radar transmits continuous sinusoidal waves, whereas UWB radar transmits extremely short pulses (picoseconds to nanoseconds). * The study used a large 980-hour FMCW dataset to "teach" the model the characteristics of human vitals. * This pre-trained knowledge was then applied to a smaller 37.3-hour UWB dataset, proving that heart rate features are consistent enough across hardware types for effective transfer learning. ## A Novel Spatio-Temporal Deep Learning Model The researchers designed a custom neural network architecture to process the complex multidimensional data generated by radar sensors. * The framework uses a 2D ResNet to analyze the input data across two axes: time and spatial measurements. * Following the initial analysis, the model uses average pooling to collapse the spatial dimension, focusing purely on the temporal signal. * A 1D ResNet then identifies long-range periodic patterns to estimate the heart rate. * The model achieved a mean absolute error (MAE) of 0.85 beats per minute (bpm), which is a 50% reduction in error compared to previous state-of-the-art methods. This research indicates that high-precision health monitoring can be integrated into the mobile devices users already carry. By transforming smartphones into passive health sensors, UWB technology could allow for continuous heart rate tracking during routine activities, such as sitting at a desk or holding a phone in one's lap.