juniorLLMs
What is self-attention in LLMs?
Updated May 16, 2026
Short answer
Self-attention allows each token to focus on other relevant tokens in the input sequence.
Deep explanation
Self-attention computes relationships between all tokens by generating query, key, and value vectors. The attention scores determine how much each token influences another. This enables context-aware representations.
Real-world example
In 'The cat sat on the mat', the model understands 'cat' relates to 'sat'.
Common mistakes
- Confusing attention with memory.
Follow-up questions
- What are Q, K, V matrices?
- What is multi-head attention?