google Jul 21, 2025

LSM-2: Learning from incomplete wearable sensor data (opens in new tab)

ai transformer foundation-models wearable-technology self-supervised-learning masked-autoencoders attention-masking

LSM-2 introduces a paradigm shift in processing wearable sensor data by treating naturally occurring data gaps as inherent features rather than errors to be corrected. By utilizing the Adaptive and Inherited Masking (AIM) framework, the model learns directly from fragmented, real-world data streams without the need for biased imputation or data-discarding filters. This approach allows LSM-2 to achieve state-of-the-art performance in health-related classification and regression tasks, maintaining robustness even when sensors fail or data is highly interrupted.

The Challenge of Pervasive Missingness

Real-world wearable data is almost never continuous; factors such as device charging, motion artifacts, and battery-saving modes create frequent "missingness."
Traditional self-supervised learning models require complete data, forcing researchers to use imputation—which can introduce artificial bias—or aggressive filtering that discards over 90% of potentially useful samples.
In a dataset of 1.6 million day-long windows, research found that not a single sample had 0% missingness, highlighting the impracticality of training only on complete datasets.

Adaptive and Inherited Masking (AIM)

AIM extends the Masked Autoencoder (MAE) framework by treating "inherited" masks (naturally occurring gaps) and "artificial" masks (training objectives) as equivalent.
The framework utilizes a dual masking strategy: it employs token dropout on a fixed ratio of tokens to ensure computational efficiency during encoding.
To handle the unpredictable and variable nature of real-world gaps, AIM uses attention masking within the transformer blocks for any remaining masked tokens.
During evaluation and fine-tuning, the model relies solely on attention masking to navigate naturally occurring gaps, allowing for accurate physiological modeling without filling in missing values.

Scale and Training Architecture

LSM-2 was trained on a massive dataset comprising 40 million hours of de-identified wearable data from more than 60,000 participants using Fitbit and Google Pixel devices.
The model learns to understand underlying physiological structures by reconstructing masked segments across multimodal inputs, including heart signals, sleep patterns, and activity levels.
Because it is trained on fragmented data, the resulting foundation model is significantly more resilient to sensor dropouts in downstream tasks like hypertension prediction or stress monitoring.

LSM-2 demonstrates that foundation models for health should be built to embrace the messiness of real-world environments. By integrating missingness directly into the self-supervised learning objective, developers can bypass the computational and statistical overhead of imputation while building more reliable diagnostic and monitoring tools.