What is a token in NLP?

Tokens are the atoms of language models.
OpenAI Documentation

How It Works:

A token is a chunk of text (word, subword, or character) that models process individually; tokenization breaks input into these units before inference.

Key Benefits:

  • Standardizes varying text lengths
  • Enables subword handling of rare words
  • Controls sequence length for models

Real-World Use Cases:

  • Measuring prompt cost (token-based billing)
  • Truncating long inputs for context windows

FAQs

How many tokens in a word?
What?s a token limit?