Time series foundation models can be few-shot learners (opens in new tab)
Researchers at Google have introduced TimesFM-ICF, a foundation model that enables time-series forecasting to transition from zero-shot to few-shot learning via in-context fine-tuning. By utilizing continued pre-training and specialized separator tokens, the model learns to adapt to a handful of related examples at inference time without requiring the complex supervised fine-tuning typically needed for task-specific optimization. This approach effectively matches or exceeds the performance of specialized models while maintaining the flexibility of a general-purpose foundation model.
Overcoming the Limitations of Zero-Shot Models
- Traditional forecasting often requires building separate, specialized models for every unique task, which is resource-intensive and slow.
- While zero-shot models like the original TimesFM provide immediate forecasts without task-specific training, they cannot incorporate relevant context, such as data from nearby sensors or similar historical patterns.
- The In-Context Fine-tuning (ICF) approach allows the model to "learn" from a few examples provided at the time of prediction, similar to how Large Language Models (LLMs) use few-shot prompting.
Architecture and the Common Separator Token
- TimesFM-ICF utilizes a patched decoder architecture that tokenizes 32 contiguous timepoints into a single input token.
- To prevent the model from conflating different data streams—such as separate store locations or distinct time periods—researchers introduced a "common separator token" as a digital boundary between examples.
- The model processes these tokens through a transformer stack using causal self-attention (CSA), ensuring it learns from historical context without accidentally "peeking" into the future.
- A shared multilayer perceptron (MLP) translates the processed output tokens back into a forecast spanning 128 timepoints.
Performance Benchmarking and Results
- The model was evaluated on 23 unseen datasets, using the Mean Absolute Scaled Error (MASE) metric to aggregate performance across diverse time-series tasks.
- TimesFM-ICF demonstrated a significant performance boost over the original zero-shot TimesFM and other state-of-the-art foundation models like Moirai and Lag-Llama.
- Test results showed that providing just a few in-context examples allowed the model to match the accuracy of supervised fine-tuning, which normally requires much more computational overhead and data curation.
TimesFM-ICF represents a practical shift for businesses managing diverse data streams, offering a way to achieve high-accuracy forecasts by simply providing a few relevant historical examples. For those looking to optimize inventory or energy demands, this method provides the precision of a custom-tuned model with the deployment speed of a pre-trained foundation model.