juniorLLMOps

What is tokenization in LLMs?

Updated May 16, 2026

Short answer

Tokenization is the process of converting text into smaller units (tokens) that LLMs can process.

Deep explanation

LLMs do not process raw text directly. Instead, text is split into tokens such as subwords or characters using tokenizers like BPE. Each token maps to an integer ID. The model processes sequences of token IDs to generate predictions.

Real-world example

API billing is based on number of tokens processed per request.

Common mistakes

  • Assuming tokens equal words.

Follow-up questions

  • Why are tokens important for cost?
  • What is BPE?

More LLMOps interview questions

View all →