What is tokenization in LLMs?

Updated May 16, 2026

Short answer

Tokenization is the process of converting text into smaller units (tokens) that LLMs can process.

Deep explanation

LLMs do not process raw text directly. Instead, text is split into tokens such as subwords or characters using tokenizers like BPE. Each token maps to an integer ID. The model processes sequences of token IDs to generate predictions.

Real-world example

API billing is based on number of tokens processed per request.

Common mistakes

Assuming tokens equal words.

Follow-up questions

Why are tokens important for cost?
What is BPE?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More LLMOps interview questions