Attention Mechanism
ArchitectureSimple Definition
The core innovation in Transformer models that allows AI to weigh the importance of different parts of input text when generating each output token.
Full Explanation
Self-attention allows each token in a sequence to 'pay attention' to every other token when computing its representation. This enables models to capture long-range dependencies in text — understanding that 'it' in a sentence refers to something mentioned several paragraphs earlier. Multi-head attention runs this process in parallel multiple times to capture different types of relationships.
Related Terms
Last verified: 2026-03-30← Back to Glossary