google Jul 28, 2025

Simulating large systems with Regression Language Models (opens in new tab)

ai llm google-borg regression-language-models text-to-text-regression numerical-prediction performance-prediction density-estimation cross-entropy

Researchers from Google have introduced Regression Language Models (RLMs) as a universal solution for numeric prediction tasks by framing regression as a text-to-text problem. By converting complex, unstructured system data into strings, RLMs can predict performance metrics without the need for manual feature engineering or data normalization. This approach allows large language models to move beyond subjective human feedback and directly model raw operational data for large-scale software and industrial infrastructures.

Conceptualizing Text-to-Text Regression

Traditional regression methods rely on tabular data—fixed-length numeric vectors—which are difficult and laborious to maintain for evolving systems like software logs or hardware patterns.
RLMs represent the input state ($x$) as a structured text string (such as JSON or YAML) and the numerical output ($y$) as a text string.
The model is trained using standard next-token prediction and cross-entropy loss, allowing it to function as a universal approximator for complex data types.
This paradigm eliminates the need for manual feature engineering, as the model learns directly from the raw textual representation of the system state.

Architecture and Training for Large Systems

The research utilizes a compact RLM consisting of a two-layer encoder-decoder architecture with 60 million parameters.
To manage large inputs that can reach up to 1 million tokens, the system reorders features by importance at the beginning of the string so that critical data is preserved when truncated to the model's 8k token limit.
Pre-training the RLM on diverse regression tasks enables few-shot adaptation, allowing the model to adjust to new data types with minimal gradient updates.
Numerical values are processed as-is within the text, removing the requirement for traditional scaling or normalization common in standard machine learning pipelines.

Optimizing Google's Borg Infrastructure

The method was specifically applied to Google’s Borg system to predict MIPS per GCU (Millions of Instructions Per Second per Google Compute Unit), a vital efficiency metric.
The RLM simulates the outcomes of complex bin-packing algorithms within a "digital twin" framework to optimize resource allocation across CPUs and TPUs.
By analyzing execution traces and textual metadata, the model provides high-accuracy forecasting for diverse workloads including Gmail, YouTube, and Maps.

Density Capture and Uncertainty Modeling

Unlike traditional regressors that provide a single point estimate, RLMs can capture full probability distributions by sampling the decoded output multiple times.
This density estimation is critical for modeling aleatoric uncertainty, which represents the inherent randomness and stochastic load demands of large-scale compute environments.
The ability to visualize these distributions helps engineers identify the range of possible outcomes and the inherent variability of the system's performance over time.

This research demonstrates that small, specialized language models can effectively replace traditional regression methods in highly dynamic environments. For practitioners looking to implement these capabilities, the open-source regress-lm library provides a framework for simulating large systems and predicting performance across varied industrial and scientific use cases.