time-series

1 posts

datadog

Robust statistical distances for machine learning | Datadog (opens in new tab)

Datadog has introduced Toto, a new open-weights foundation model specifically designed for time-series forecasting and anomaly detection within observability contexts. While general-purpose time-series models often struggle with the unique volatility and high-frequency patterns of IT telemetry, Toto is pre-trained on a massive dataset of 500 billion observations to provide superior zero-shot performance. This release, accompanied by the BOOM benchmark, addresses the critical need for specialized AI tools capable of handling the complexity of modern cloud infrastructure. ### Toto Model Architecture and Training * Toto utilizes a decoder-only transformer architecture, adapting large language model (LLM) principles to the domain of continuous numerical data. * The model employs a "patching" mechanism, which groups multiple time-series data points into single tokens to improve computational efficiency and allow the model to capture longer historical dependencies. * It incorporates Rotary Positional Embeddings (RoPE) to better handle sequences of varying lengths and maintain temporal relationships across different frequencies. * Training was conducted on a curated dataset of 500 billion anonymized data points from real-world observability metrics, including CPU usage, memory consumption, and network traffic. ### Specialized Observability Features * Unlike existing models like TimesFM or Chronos, which are trained on diverse but general datasets like weather or retail trends, Toto is optimized for the specific "spikiness" and abrupt level shifts common in IT environments. * The model supports zero-shot forecasting, allowing users to generate predictions for new metrics immediately without the need for expensive or time-consuming fine-tuning. * Toto is designed to handle varying sampling rates, from one-second intervals to hourly aggregations, making it versatile across different infrastructure layers. * The open-weights release on Hugging Face allows researchers and engineers to integrate the model into their own AIOps workflows or private cloud environments. ### The BOOM Evaluation Framework * Datadog released the Benchmarking Observability Models (BOOM) framework to provide a standardized method for evaluating time-series models on infrastructure-specific tasks. * BOOM focuses on metrics that represent real-world operational challenges, such as seasonal traffic patterns and sudden system failures. * Comparative testing shows that Toto consistently outperforms general-purpose models in Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) when applied to observability datasets. * The benchmark provides a transparent way for the industry to measure progress in time-series foundation models, moving beyond generic datasets that do not reflect the realities of microservices and distributed systems. Organizations looking to automate capacity planning, optimize cloud spend, or implement intelligent alerting should consider adopting Toto for their time-series analysis. By utilizing the open-weights model alongside the BOOM benchmark, teams can achieve high-accuracy forecasting and objective performance validation without the overhead of building specialized models from scratch.