Token
Core ConceptsThe basic unit of text that AI language models process — roughly equivalent to 3/4 of a word in English.
Full Explanation
Tokenization breaks text into smaller pieces (tokens) before an LLM processes it. One token is approximately 4 characters or 0.75 words in English. '1 million tokens' is roughly 750,000 words or about 1,500 pages. API pricing is measured per token — knowing token counts helps estimate costs.
The sentence 'Hello, how are you?' is approximately 6 tokens.
Related Terms
The maximum amount of text (measured in tokens) that an AI model can 'see' and process in a single interaction.
The process of splitting text into tokens (the basic units an LLM processes) before feeding it to a model.
A type of AI model trained on vast amounts of text data that can generate, summarize, translate, and reason about language.