Deciphering language processing in the human brain through LLM representations (opens in new tab)
Recent research by Google Research and collaborating universities indicates that Large Language Models (LLMs) process natural language through internal representations that closely mirror neural activity in the human brain. By comparing intracranial recordings from spontaneous conversations with the internal embeddings of the Whisper speech-to-text model, the study found a high degree of linear alignment between artificial and biological language processing. These findings suggest that the statistical structures learned by LLMs via next-word prediction provide a viable computational framework for understanding how humans comprehend and produce speech.
Mapping LLM Embeddings to Brain Activity
- Researchers utilized intracranial electrodes to record neural signals during real-world, free-flowing conversations.
- The study compared neural activity against two distinct types of embeddings from the Transformer-based Whisper model: "speech embeddings" from the model’s encoder and "language embeddings" from the decoder.
- A linear transformation was used to predict brain signals based on these embeddings, revealing that LLMs and the human brain share similar multidimensional spaces for coding linguistic information.
- The alignment suggests that human language processing may rely more on statistical structures and contextual embeddings rather than traditional symbolic rules or syntactic parts of speech.
Neural Sequences in Speech Comprehension
- When a subject listens to speech, the brain follows a specific chronological sequence that aligns with model representations.
- Initially, speech embeddings predict cortical activity in the superior temporal gyrus (STG), which is responsible for processing auditory speech sounds.
- A few hundred milliseconds later, language embeddings predict activity in Broca’s area (located in the inferior frontal gyrus), marking the transition from sound perception to decoding meaning.
Reversed Dynamics in Speech Production
- During speech production, the neural sequence is reversed, beginning approximately 500 milliseconds before a word is articulated.
- Processing starts in Broca’s area, where language embeddings predict activity as the brain plans the semantic content of the utterance.
- This is followed by activity in the motor cortex (MC), aligned with speech embeddings, as the brain prepares the physical articulatory movements.
- Finally, after articulation, speech embeddings predict activity back in the STG, suggesting the brain is monitoring the sound of the speaker's own voice.
This research validates the use of LLMs as powerful predictive tools for neuroscience, offering a new lens through which to study the temporal and spatial dynamics of human communication. By bridging the gap between artificial intelligence and cognitive biology, researchers can better model how the brain integrates sound and meaning in real-time.