What is Positional Encoding in Transformers and why is it necessary?

Updated May 16, 2026

Short answer

Positional encoding injects information about token order into Transformer models since self-attention alone is permutation-invariant.

Deep explanation

Unlike RNNs, Transformers process tokens in parallel, which means they do not inherently understand sequence order.

Without positional information:

  • 'dog bites man' equals 'man bites dog'.

Positional encoding solves this by adding order-aware signals to embeddings.

Types:

  1. Sinusoidal Positional Encoding:
  • Uses sine and cosine functions.
  • Provides continuous position representation.

Formula: PE(pos, 2i) = sin(pos / 10000^(2i/d)) PE(pos, 2i+1) = cos(pos / 10000^(2i/d))

  1. Learned Positional Embeddings:
  • Trainable position vectors.

3.…

Unlock with a Pro subscription to view this section.

View pricing

Real-world example

No real-world example available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Common mistakes

No common mistakes listed yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Follow-up questions

No follow-up questions available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

More Deep Learning interview questions

View all →