What is a token in NLP?

Tokens are the atoms of language models.

OpenAI Documentation

How It Works:

A token is a chunk of text (word, subword, or character) that models process individually; tokenization breaks input into these units before inference.

‍

Key Benefits:

Standardizes varying text lengths
Enables subword handling of rare words
Controls sequence length for models

‍

Real-World Use Cases:

Measuring prompt cost (token-based billing)
Truncating long inputs for context windows

What is a token in NLP?

FAQs