What is Positional Encoding in Transformers and why is it necessary?
Updated May 16, 2026
Short answer
Positional encoding injects information about token order into Transformer models since self-attention alone is permutation-invariant.
Deep explanation
Unlike RNNs, Transformers process tokens in parallel, which means they do not inherently understand sequence order.
Without positional information:
- 'dog bites man' equals 'man bites dog'.
Positional encoding solves this by adding order-aware signals to embeddings.
Types:
- Sinusoidal Positional Encoding:
- Uses sine and cosine functions.
- Provides continuous position representation.
Formula: PE(pos, 2i) = sin(pos / 10000^(2i/d)) PE(pos, 2i+1) = cos(pos / 10000^(2i/d))
- Learned Positional Embeddings:
- Trainable position vectors.
3.…
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro