Transformer
ArchitectureThe neural network architecture that underpins all modern large language models, introduced by Google in 2017.
Full Explanation
The Transformer architecture, introduced in the paper 'Attention Is All You Need' (Vaswani et al., 2017), replaced previous recurrent neural networks with a self-attention mechanism that processes all tokens in parallel. This enabled training on much larger datasets with greater efficiency. Every major LLM (GPT, Claude, Gemini, LLaMA) is based on the Transformer architecture.
GPT stands for Generative Pre-trained Transformer — named after the architecture it uses.
Related Terms
The core innovation in Transformer models that allows AI to weigh the importance of different parts of input text when generating each output token.
A type of AI model trained on vast amounts of text data that can generate, summarize, translate, and reason about language.
A large AI model trained on broad data at scale that can be adapted for many different downstream tasks.