discord

Discord Patch Notes: August 4, 2025 (opens in new tab)

Discord's "Patch Notes" series serves as a regular communication channel for documenting technical enhancements across performance, reliability, and platform responsiveness. The initiative emphasizes a collaborative development cycle where engineering fixes are transparently reported alongside invitations for community involvement in the debugging process. ### Community Feedback and Bug Tracking * Discord utilizes the community-managed r/DiscordApp subreddit to gather user feedback on software regressions. * A dedicated Bimonthly Bug Megathread acts as a direct line of communication between the general user base and the engineering team for reporting specific technical issues. ### Pre-release Testing via TestFlight * Users seeking early access to features can participate in the Discord TestFlight program on iOS. * This beta testing phase allows the development team to identify and resolve "pesky bugs" in a controlled environment before the code reaches the stable production branch. ### Deployment and Version Control * All improvements and bug squishing listed in the series represent code that has already been committed and merged into the repository. * Despite being merged, these updates follow a staggered deployment schedule, meaning individual platform availability may vary as the rollout progresses to all users. To help maintain platform stability and gain early access to new functionality, users should consider joining the iOS TestFlight program or documenting persistent issues within the official community Reddit threads.

line

Replacing the Payment System DB Handling (opens in new tab)

The LINE Billing Platform successfully migrated its large-scale payment database from Nbase-T to Vitess to handle high-traffic global transactions. While initially exploring gRPC for its performance reputation, the team transitioned to the MySQL protocol to ensure stability and reduce CPU overhead within their Java-based environment. This implementation demonstrates how Vitess can manage complex sharding requirements while maintaining high availability through automated recovery tools. ### Protocol Selection and Implementation - The team initially attempted to use the gRPC protocol but encountered `http2: frame too large` errors and significant CPU overhead during performance testing. - Manual mapping of query results to Java objects proved cumbersome with the Vitess gRPC client, leading to a shift toward the more mature and recommended MySQL protocol. - Using the MySQL protocol allowed the team to leverage standard database drivers while benefiting from Vitess's routing capabilities via VTGate. ### Keyspace Architecture and Data Routing - The system utilizes a dual-keyspace strategy: a "Global Keyspace" for unsharded metadata and a "Service Keyspace" for sharded transaction data. - The Global Keyspace manages sharding keys using a "sequence" table type to ensure unique, auto-incrementing identifiers across the platform. - The Service Keyspace is partitioned into $N$ shards using a hash-based Vindex, which distributes coin balances and transaction history. - VTGate automatically routes queries to the correct shard by analyzing the sharding key in the `WHERE` clause or `INSERT` statement, minimizing cross-shard overhead. ### MySQL Compatibility and Transaction Logic - Vitess maintains `REPEATABLE READ` isolation for single-shard transactions, while multi-shard transactions default to `READ COMMITTED`. - Advanced features like Two-Phase Commit (2PC) are available for handling distributed transactions across multiple shards. - Query execution plans are analyzed using `VEXPLAIN` and `VTEXPLAIN`, often managed through the VTAdmin web interface for better visibility. - Certain limitations apply, such as temporary tables only being supported in unsharded keyspaces and specific unsupported SQL cases documented in the Vitess core. ### Automated Operations and Monitoring - The team employs VTOrc (based on Orchestrator) to automatically detect and repair database failures, such as unreachable primaries or replication stops. - Monitoring is centralized via Prometheus, which scrapes metrics from VTOrc, VTGate, and VTTablet components at dedicated ports (e.g., 16000). - Real-time alerts are routed through Slack and email, using `tablet_alias` to specifically identify which MySQL node or VTTablet is experiencing issues. - A web-based recovery dashboard provides a history of automated fixes, allowing operators to track the health of the cluster over time. For organizations migrating high-traffic legacy systems to a cloud-native sharding solution, prioritizing the MySQL protocol over gRPC is recommended for better compatibility with existing application frameworks and reduced operational complexity.

google

MLE-STAR: A state-of-the-art machine learning engineering agent (opens in new tab)

MLE-STAR is a state-of-the-art machine learning engineering agent designed to automate complex ML tasks by treating them as iterative code optimization challenges. Unlike previous agents that rely solely on an LLM’s internal knowledge, MLE-STAR integrates external web searches and targeted ablation studies to pinpoint and refine specific pipeline components. This approach allows the agent to achieve high-performance results, evidenced by its ability to win medals in 63% of Kaggle competitions within the MLE-Bench-Lite benchmark. ## External Knowledge and Targeted Ablation The core of MLE-STAR’s effectiveness lies in its ability to move beyond generic machine learning libraries by incorporating external research and specific performance testing. * The agent uses web search to retrieve task-specific, state-of-the-art models and approaches rather than defaulting to familiar libraries like scikit-learn. * Instead of modifying an entire script at once, the system conducts an ablation study to evaluate the impact of individual pipeline components, such as feature engineering or model selection. * By identifying which code blocks have the most significant impact on performance, the agent can focus its reasoning and optimization efforts where they are most needed. ## Iterative Refinement and Intelligent Ensembling Once the critical components are identified, MLE-STAR employs a specialized refinement process to maximize the effectiveness of the generated solution. * Targeted code blocks undergo iterative refinement based on LLM-suggested plans that incorporate feedback from prior experimental failures and successes. * The agent features a unique ensembling strategy where it proposes multiple candidate solutions and then designs its own method to merge them. * Rather than using simple validation-score voting, the agent iteratively improves the ensemble strategy itself, treating the combination of models as a distinct optimization task. ## Robustness and Safety Verification To ensure the generated code is both functional and reliable for real-world deployment, MLE-STAR incorporates three specialized diagnostic modules. * **Debugging Agent:** Automatically analyzes tracebacks and execution errors in Python scripts to provide iterative corrections. * **Data Leakage Checker:** Reviews the solution script prior to execution to ensure the model does not improperly access test dataset information during the training phase. * **Data Usage Checker:** Analyzes whether the script is utilizing all available data sources, preventing the agent from overlooking complex data formats in favor of simpler files like CSVs. By combining external grounding with a granular, component-based optimization strategy, MLE-STAR represents a significant shift in automated machine learning. For organizations looking to scale their ML workflows, such an agent suggests a future where the role of the engineer shifts from manual coding to high-level supervision of autonomous agents that can navigate the vast landscape of research and data engineering.

google

Simulating large systems with Regression Language Models (opens in new tab)

Researchers from Google have introduced Regression Language Models (RLMs) as a universal solution for numeric prediction tasks by framing regression as a text-to-text problem. By converting complex, unstructured system data into strings, RLMs can predict performance metrics without the need for manual feature engineering or data normalization. This approach allows large language models to move beyond subjective human feedback and directly model raw operational data for large-scale software and industrial infrastructures. ## Conceptualizing Text-to-Text Regression * Traditional regression methods rely on tabular data—fixed-length numeric vectors—which are difficult and laborious to maintain for evolving systems like software logs or hardware patterns. * RLMs represent the input state ($x$) as a structured text string (such as JSON or YAML) and the numerical output ($y$) as a text string. * The model is trained using standard next-token prediction and cross-entropy loss, allowing it to function as a universal approximator for complex data types. * This paradigm eliminates the need for manual feature engineering, as the model learns directly from the raw textual representation of the system state. ## Architecture and Training for Large Systems * The research utilizes a compact RLM consisting of a two-layer encoder-decoder architecture with 60 million parameters. * To manage large inputs that can reach up to 1 million tokens, the system reorders features by importance at the beginning of the string so that critical data is preserved when truncated to the model's 8k token limit. * Pre-training the RLM on diverse regression tasks enables few-shot adaptation, allowing the model to adjust to new data types with minimal gradient updates. * Numerical values are processed as-is within the text, removing the requirement for traditional scaling or normalization common in standard machine learning pipelines. ## Optimizing Google's Borg Infrastructure * The method was specifically applied to Google’s Borg system to predict MIPS per GCU (Millions of Instructions Per Second per Google Compute Unit), a vital efficiency metric. * The RLM simulates the outcomes of complex bin-packing algorithms within a "digital twin" framework to optimize resource allocation across CPUs and TPUs. * By analyzing execution traces and textual metadata, the model provides high-accuracy forecasting for diverse workloads including Gmail, YouTube, and Maps. ## Density Capture and Uncertainty Modeling * Unlike traditional regressors that provide a single point estimate, RLMs can capture full probability distributions by sampling the decoded output multiple times. * This density estimation is critical for modeling aleatoric uncertainty, which represents the inherent randomness and stochastic load demands of large-scale compute environments. * The ability to visualize these distributions helps engineers identify the range of possible outcomes and the inherent variability of the system's performance over time. This research demonstrates that small, specialized language models can effectively replace traditional regression methods in highly dynamic environments. For practitioners looking to implement these capabilities, the open-source `regress-lm` library provides a framework for simulating large systems and predicting performance across varied industrial and scientific use cases.

line

Introducing a case of utilizing DDD in (opens in new tab)

LY Corporation’s ABC Studio developed a specialized retail Merchant system by leveraging Domain-Driven Design (DDD) to overcome the functional limitations of a legacy food-delivery infrastructure. The project demonstrates that the primary value of DDD lies not just in technical implementation, but in aligning organizational structures and team responsibilities with domain boundaries. By focusing on the roles and responsibilities of the system rather than just the code, the team created a scalable platform capable of supporting diverse consumer interfaces. ### Redefining the Retail Domain * The legacy system treated retail items like restaurant entries, creating friction for specialized retail services; the new system was built to be a standalone platform. * The team narrowed the domain focus to five core areas: Shop, Item, Category, Inventory, and Order. * Sales-specific logic, such as coupons and promotions, was delegated to external "Consumer Platforms," allowing the Merchant system to serve as a high-performance information provider. ### Clean Architecture and Modular Composition * The system utilizes Clean Architecture to ensure domain entities remain independent of external frameworks, which also provided a manageable learning curve for new team members. * Services are split into two distinct modules: "API" modules for receiving external requests and "Engine" modules for processing business logic. * Communication between these modules is handled asynchronously via gRPC and Apache Kafka, using the Decaton library to increase throughput while maintaining a low partition count. * The architecture prioritizes eventual consistency, allowing for high responsiveness and scalability across the platform. ### Global Collaboration and Conway’s Law * Development was split between teams in Korea (Core Domain) and Japan (System Integration and BFF), requiring a shared understanding of domain boundaries. * Architectural Decision Records (ADR) were implemented to document critical decisions and prevent "knowledge drift" during long-term collaboration. * The organizational structure was intentionally designed to mirror the system architecture, with specific teams (Core, Link, BFF, and Merchant Link) assigned to distinct domain layers. * This alignment, reflecting Conway’s Law, ensures that changes to external consumer platforms have minimal impact on the stable core domain logic. Successful DDD adoption requires moving beyond technical patterns like hexagonal architecture and focusing on establishing a shared understanding of roles across the organization. By structuring teams to match domain boundaries, companies can build resilient systems where the core business logic remains protected even as the external service ecosystem evolves.

google

SensorLM: Learning the language of wearable sensors (opens in new tab)

SensorLM is a new family of foundation models designed to bridge the gap between high-dimensional wearable sensor data and natural language descriptions. By training on a massive dataset of nearly 60 million hours of de-identified health data, the models learn to interpret complex physiological signals to provide meaningful context for human activities. This research demonstrates that integrating multimodal sensor signals with language models enables sophisticated health insights, such as zero-shot activity recognition and automated health captioning, that significantly outperform general-purpose large language models. ## Dataset Scale and Automated Annotation * The models were pre-trained on an unprecedented 59.7 million hours of multimodal sensor data collected from over 103,000 individuals across 127 countries. * To overcome the high cost of manual annotation, researchers developed a hierarchical pipeline that automatically generates text descriptions by calculating statistics and identifying trends within the raw sensor streams. * Data was sourced from Fitbit and Pixel Watch devices, representing nearly 2.5 million person-days of activity and health information. ## Hybrid Training Architecture * SensorLM unifies two primary multimodal strategies: contrastive learning and generative pre-training. * Through contrastive learning, the model learns to discriminate between different states—such as a "light swim" versus a "strength workout"—by matching sensor segments to corresponding text descriptions. * The generative component allows the model to "speak" for the sensors, producing nuanced, context-aware natural language captions directly from high-dimensional biometric signals. ## Activity Recognition and Cross-Modal Capabilities * The model demonstrates state-of-the-art performance in zero-shot human activity recognition, accurately classifying 20 different activities without any specific fine-tuning. * Its few-shot learning capabilities allow the model to adapt to new tasks or individual user patterns with only a handful of examples. * SensorLM facilitates cross-modal retrieval, enabling users or experts to find specific sensor patterns using natural language queries or to generate descriptions based on specific sensor inputs. ## Generative Health Captioning * Beyond simple classification, the model can generate hierarchical captions that describe the statistical, structural, and semantic dimensions of a user’s data. * Experimental results using metrics like BERTScore show that SensorLM produces captions that are more factually correct and coherent than those created by powerful non-specialist LLMs. * This capability allows for the translation of abstract data points, such as heart rate variability or step counts, into readable summaries that explain the "why" behind physiological changes. By providing a framework where wearable data can be understood through the lens of human language, SensorLM paves the way for more intuitive and personalized health monitoring. This technology holds the potential to transform raw biometric streams into actionable insights, helping users better understand the relationship between their activities and their overall physical well-being.

google

Synthetic and federated: Privacy-preserving domain adaptation with LLMs for mobile applications (opens in new tab)

Researchers at Google have developed a framework for improving both small and large language models (LMs) in mobile applications like Gboard by utilizing privacy-preserving synthetic data and federated learning. This approach combines differential privacy (DP) with large language model (LLM) generation to minimize data memorization risks while achieving significant gains in production metrics like next-word prediction and proofreading. The result is a robust pipeline that allows models to adapt to specific user domains without compromising individual privacy or requiring centralized data storage. ### Strengthening Privacy with DP-FL * Gboard has transitioned all production LMs trained on user data to a Federated Learning with Differential Privacy (DP-FL) framework, ensuring data remains on-device and is never memorized. * The deployment utilizes the **BLT-DP-FTRL** algorithm, which offers an optimized trade-off between privacy guarantees and model utility while being easier to deploy in production. * Engineers adopted the **SI-CIFG** model architecture to facilitate efficient on-device training, ensuring the hardware can handle local updates while maintaining compatibility with DP constraints. ### Synthetic Data Generation via Public LLMs * Powerful LLMs trained on public web data are prompted to synthesize high-quality text that mimics mobile user interactions without ever accessing actual private user data. * The process involves a two-step prompting strategy: first, filtering public datasets to identify topics common in mobile communication, and second, generating new, domain-specific text based on those patterns. * This synthetic data serves as a bridge for pre-training small LMs, which are then refined through private post-training on-device to capture the nuances of user behavior. ### Adapting LLMs for Mobile Proofreading * To support advanced features like Gboard's "Proofread," researchers developed a "Synthesize-then-Adapt" pipeline specifically for error correction. * LLMs generate synthetic "corrupted" text to simulate common mobile typing errors, providing the necessary training pairs (error/correction) that are difficult to find in public datasets. * Federated learning is then used to adapt these error-correction models to specific app domains (such as messaging or email) using on-device signals, ensuring the model understands the specific context of the user's typing. The success of these techniques in Gboard demonstrates that synthetic data can effectively replace or augment private data throughout the machine learning lifecycle. For developers working with sensitive user information, adopting a "synthetic-first" approach combined with federated learning provides a scalable path to model improvement that adheres to the core principles of data minimization and anonymization.