The Transformers Architecture

Transformer is a neural network architecture first introduced in 2017 by a paper published by Google named Attention is All You Need [@vaswaniAttentionAllYou2023]. It uses a mechanism of attention to parallelise the training process and, as a result, drastically speed up AI model development [@alammarIllustratedTransformer2018].

A few distinctive features set transformers apart from the previously dominant architecture, the Recurrent Neural Networks (RNN).

Parallelisation

Unlike earlier architectures that process sentences word by word sequentially, transformers employ a parallel process by introducing positional encoding and attention.

Attention

The attention mechanism provides an efficient way for transformers to learn the context in which the words are used. Each word only attends to other words to which it must pay attention to complete a given language task successfully. For example, if the model is tasked to translate ‘The chicken crossed the road because it thought it was fun.’ to French, it must understand that the first ‘it’ refers to the chicken, not the road. Therefore, the first ‘it’ must be ‘attending’ to the word ‘chicken’ so the model would respect grammar rules such as agreeing with gender and numbers. On the other hand, ‘it’ has little to do with the word ‘because’, and it can pay little to no attention to that word.

References

Alammar, J. (2018) The Illustrated Transformer. https://jalammar.github.io/illustrated-transformer/.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. & Polosukhin, I. (2023) Attention Is All You Need. doi:10.48550/arXiv.1706.03762.

Liwen's Notes

Explorer

Recent Notes

How to Read Effectively

LangChain

Mixture of Expert (MoE)

The `NextRequest` and `NextResponse` Objects

LLM Explainability & Interpretability

Accountability & Motivation

Reasoning Models

Gen AI: The Future Is Agentic

The Transformers Architecture

Parallelisation

Attention

References

Graph View

Table of Contents

Backlinks