state-space-models

1 posts

google

Titans + MIRAS: Helping AI have long-term memory (opens in new tab)

Google Research has introduced Titans, a new architecture, and MIRAS, a theoretical framework, designed to overcome the computational limitations of Transformers while maintaining high-fidelity long-term memory. These innovations utilize "test-time memorization," allowing models to update their core parameters in real-time as they process data without requiring offline retraining. By combining the speed of linear recurrent neural networks (RNNs) with the accuracy of attention mechanisms, the system enables AI to handle massive contexts such as genomic analysis or full-document understanding. ## Titans and Neural Long-Term Memory * Unlike traditional RNNs that compress context into fixed-size vectors or matrices, Titans uses a multi-layer perceptron (MLP) as a dedicated long-term memory module. * This deep neural memory provides significantly higher expressive power, allowing the model to synthesize and understand entire narratives rather than just storing passive snapshots. * The architecture separates memory into two distinct modules: an attention mechanism for precise short-term context and the MLP for summarizing long-term information. ## The Gradient-Based Surprise Metric * Titans employs a "surprise metric" to decide which information is important enough to store, mirroring the human brain's tendency to remember unexpected events. * The model calculates an internal error signal (gradient); a high gradient indicates that the new input is anomalous or context-breaking, signaling it should be prioritized for long-term storage. * The system incorporates "Momentum" to track the flow of context over time, ensuring that subsequent relevant information is captured even if individual tokens are not surprising. * To manage memory capacity during extremely long sequences, an adaptive weight decay mechanism acts as a forgetting gate to discard information that is no longer useful. ## MIRAS: A Unified Framework for Sequence Modeling * MIRAS provides a theoretical blueprint that views all major sequence models—including Transformers and linear RNNs—as different forms of associative memory modules. * The framework defines sequence models through four key design choices: memory architecture (e.g., MLP vs. vector), attentional bias, and the internal learning objectives used to combine new and old data. * This approach shifts AI modeling toward real-time adaptation, where the model actively learns and incorporates specific new details into its core knowledge as data streams in. These advancements suggest a shift away from static context windows toward dynamic systems capable of lifelong learning. For developers working with large-scale data, the Titans architecture provides a practical tool for scaling performance, while the MIRAS framework offers a roadmap for designing next-generation models that adapt instantly to new information.