top of page
  • Writer's pictureTriple Helix

The Neural Network in Our Heads: How Transformer Architectures Mirror the Human Brain

Written by Andrew Ni ‘26

Edited by Wonjin Ko '26

The human brain is one of the universe’s most complex and sophisticated systems. It is capable of abstract thought, language processing, and decision-making and is the foundation for our behavior and personality. In recent years, researchers have developed artificial intelligence (AI) systems that can replicate some of the same tasks as the human brain. One type of AI system being developed is called transformer architecture, which is inspired by the structure of the human brain. Transformer architectures, like BERT, GPT-3, and ChatGPT, have revolutionized natural language processing with their ability to understand language and assist humans [2]. Here, we will explore the similarities between transformer architectures and the human brain and how we can use them to advance our understanding of how the brain processes information.

A general transformer architecture is shown in the figure above [3]. The model has an encoder-decoder structure with multi-head attention mechanisms, which allows for different parts of the input to be processed and combined, similar to neurons in the brain. Unlike other neural networks, which only connect specific input elements, transformer architectures connect each input element—whether it is a word, pixel, or number in a sequence—to every other input element. Although transformers were initially developed for language-related tasks, they have shown exceptional performance in other tasks, such as image classification [4].

Furthermore, transformer architectures are hierarchically organized with multiple layers of processing units. Commonly, there are 6-8 encoder and decoder layers that represent different levels of abstraction. The lower layers detect simple patterns, while the higher layers recognize more complex relationships. Similarly, the brain has a hierarchical organization with ~100 billion neurons forming layers in the cortex that build upon each other in a layered fashion [5]. The lower layers of the cortex detect basic features like edges and shapes, while the higher layers can recognize objects, faces, scenes, and abstract concepts. Likewise, the visual cortex has a hierarchical organization with layers that detect edges, shapes, objects, scenes, etc. These hierarchical architectures allow for increasingly abstract representations and multifarious pattern analysis.

For example, a transformer encoder can have 6 layers:

  1. Embedding Layer: Learns word/token representations

  2. Layer 1: Detects simple syntax patterns and relationships between words

  3. Layer 2: Recognizes basic phrases and word clusters

  4. Layer 3: Understands higher-level semantics and topic-based groupings

  5. Layer 4: Analyzes complex ideas and multi-clause sentences

  6. Layer 5-6: Approaches human-level language understanding with deep contextualized nuance

Specifically, Transformers are similar to the cerebral cortex, the part of the brain responsible for language, thought, perception, emotion, and planning. The cortex, like transformers, has a hierarchical structure with multiple “layers” of neural connections that process inputs at different levels of abstraction and combine information from various sources. The “memory” in Transformers also parallels how the cortex stores and retrieves information from other points in time to understand language and generate coherent thoughts.

The main feature of transformers is their powerful attention mechanisms that allow different parts of the network to focus on the most meaningful information. The Transformer attention mechanism computes a weighted sum of the encoder output vectors at each layer to determine the most relevant elements to focus on for the next layer. The attention “heads” can focus on different parts of the sequence in parallel, just as humans can concurrently attend to multiple information streams. More importantly, it can focus on essential inputs and connect different concepts. For example, when you converse with a friend at a noisy party, your brain can attend to their voice amidst various other sounds (aka The Cocktail Party Effect) [6]. Your brain also constantly makes associations between the current discussion and previous related conversations, indicating an attention-based linkage between memories.

Moreover, the brain is adept at processing sequential information, such as speech, language, motor skills, tasks, and temporal reasoning. Similarly, Transformers are designed to handle sequential data, such as natural language, with a proficiency approaching human level. Recurrent connections between neurons in the brain and the Transformer architecture allow for an “ongoing internal dialogue” that maintains context. This means transformers and the brain share architectural and network elements optimized for learning sequence, generation, and processing.

Recent research has indicated that transformer models can have a powerful impact on our understanding of the brain and its computations [7,8]. Grid cells are a specific type of neuron that fire at regular intervals during movement, which allows animals to understand their position in space [9]. Whittington and others found that transformers were mathematically equivalent to models of the grid cell firing patterns in fMRI scans [10]. In other words, they discovered transformers could determine their current location by analyzing their past states and movements in a manner that aligns with conventional models of grid cells.

Moreover, Ha and Tang have designed a model that randomly sends large amounts of data through a transformer, mimicking how the human body transmits sensory observations to the brain [11]. This ability to handle a disordered flow of information is yet another indication of the potential applications of transformer models to neuroscience. Last year, Schrimpf et al. analyzed 43 different neural net models to see how well they predicted human neural activity as reported by fMRI and electrocorticography [12]. Transformers were the most accurate model predicting almost all the variations in the imaging. Finally, a recent study has shown that deep learning algorithms like transformers converge to brain-like representations during training. As the researchers put it, “This result is not trivial: the representations that are optimal to predict masked or future words from large amounts of text could have been very distinct from those the brain learns to generate.” [13].

Recently, one of the most impressive applications of transformer models has been OpenAI’s ChatGPT, a language model that can conduct basic conversations, answer questions, and generate creative fiction. While still limited and imperfect, ChatGPT showcases the potential of these neural networks. With more data and training, these models may become indistinguishable from humans.

Still, at the end of the day, the brain is far more complex than even the most advanced neural networks today. The brain has 100 billion neurons and 100 trillion connections, compared to Transformers with billions of parameters. The brain also develops its connections over years of growth and learning, whereas Transformers are trained for a limited time on a fixed dataset. The brain is the result of evolution, while humans design Transformers. As researchers put it: “the brain is trained with a recurrent architecture and on a relatively small amount of grounded sentences, while transformers are trained with a massive feedforward architecture and on huge text databases.” [13].

So while they share similarities in how they process information, Transformers are still narrow in scope and lack many powerful capabilities that emerge from the brain's immense biological complexity. While the similarity between deep networks and the brain provides a stepping stone to unravel the foundation of natural language processing, identifying the remaining differences between these two systems remains a significant challenge to building algorithms that learn and think like humans.

In conclusion, transformers and the human brain share striking similarities in their hierarchical organization, attention mechanisms, and ability to process sequential information. These similarities suggest that the strategies used by large language models in computers are related to the processes of the human brain when processing natural language. As we develop increasingly advanced AI, insights from both cognitive science and neuroscience may help guide further progress. Our brains may very well be the key to building more human-like artificial intelligence.



[1] Fingas J. OpenAI will soon test a paid version of its hit Chatgpt Bot [Internet]. Engadget. 2023 [cited 2023Mar6]. Available from:

[2] Brown scholars put their heads together to decode the neuroscience behind chatgpt [Internet]. Brown University. 2023 [cited 2023Mar6]. Available from:

[3] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need [Internet]. 2017 [cited 2023Mar6]. Available from:

[4] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale [Internet]. 2021 [cited 2023Mar6]. Available from:

[5] Hilgetag CC, Goulas A. ‘hierarchy’ in the organization of Brain Networks. Philosophical Transactions of the Royal Society B: Biological Sciences. 2020;375(1796):20190319.

[6] Audiology AAof. The cocktail party effect [Internet]. The American Academy of Audiology. 2022 [cited 2023Mar6]. Available from:

[7] Ornes S, Quanta Magazine moderates comments to facilitate an informed substantive. How AI transformers mimic parts of the brain [Internet]. Quanta Magazine. 2022 [cited 2023Mar6]. Available from:

[8] Strohmaier D. Transformers and the brain: Literature notes [Internet]. Transformers and the Brain: Literature Notes. [cited 2023Mar6]. Available from:

[9] Hafting T, Fyhn M, Molden S, Moser M-B, Moser EI. Microstructure of a spatial map in the Entorhinal Cortex. Nature. 2005;436(7052):801–6.

[10] Whittington JCR, Warren J, Behrens TEJ. Relating transformers to models and neural representations of the Hippocampal Formation [Internet]. 2022 [cited 2023Mar6]. Available from:

[11] Tang Y, Ha D. The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning [Internet]. 2021 [cited 2023Mar6]. Available from:

[12] Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, Kanwisher N, et al. The Neural Architecture of language: Integrative modeling converges on Predictive Processing. Proceedings of the National Academy of Sciences. 2021;118(45).

[13] Caucheteux C, King J-R. Brains and algorithms partially converge in Natural Language Processing. Communications Biology. 2022;5(1).

522 views0 comments


bottom of page