Home/Blog/Understanding LLM Tokens: How AI Models Count Words
AI & Machine Learning

Understanding LLM Tokens: How AI Models Count Words

Learn what tokens are in large language models, how tokenization works, and why understanding tokens is crucial for optimizing AI costs and performance.

By Inventive HQ Team
Understanding LLM Tokens: How AI Models Count Words

If you've ever used ChatGPT, Claude, or any other AI assistant, you've interacted with tokens without even knowing it. Tokens are the fundamental unit of how large language models (LLMs) read, process, and generate text. Understanding tokens isn't just academic knowledge—it directly impacts your API costs, context limits, and how effectively you can use AI tools.

In this guide, we'll demystify tokenization, explain how different models count tokens, and show you practical techniques for estimating and optimizing token usage.

What Are Tokens?

Tokens are the atomic units that LLMs process. Rather than reading text character-by-character or word-by-word, language models break text into tokens—pieces that might be whole words, parts of words, punctuation, or even individual characters.

Consider this sentence: "Tokenization is fascinating!"

A tokenizer might break this into:

  • "Token" (1 token)
  • "ization" (1 token)
  • " is" (1 token, note the space is included)
  • " fascinating" (1 token)
  • "!" (1 token)

That's 5 tokens for 3 words and 1 punctuation mark. This example illustrates a key principle: tokens don't map 1:1 to words.

Why Tokens Instead of Words?

You might wonder why AI models don't simply process whole words. There are several compelling reasons:

1. Vocabulary Size Management

English has over 170,000 words in current use, plus technical jargon, proper nouns, and foreign words. Including every possible word would create an unwieldy vocabulary. Tokenization reduces vocabulary to a manageable 32,000-100,000 tokens while still covering all possible text.

2. Handling Unknown Words

What happens when someone types "ChatGPTification" or makes a typo like "teh"? Word-based systems would fail. Subword tokenization gracefully handles any text by breaking unknown words into known pieces.

3. Multilingual Support

Tokens work across languages. Chinese characters, Arabic script, and emoji can all be tokenized without needing separate vocabularies for each language.

4. Computational Efficiency

Shorter token sequences mean faster processing. Balancing vocabulary size against sequence length optimizes the computational cost of running the model.

How Tokenization Works

Modern LLMs primarily use variations of Byte Pair Encoding (BPE), a compression algorithm adapted for natural language processing.

The BPE Process

  1. Start with characters: Begin with individual characters as the initial vocabulary
  2. Count pairs: Find the most frequent adjacent pair of tokens
  3. Merge: Combine that pair into a new token
  4. Repeat: Continue until reaching the desired vocabulary size

For example, starting with the text "low lower lowest":

  • Initial: ['l', 'o', 'w', ' ', 'l', 'o', 'w', 'e', 'r', ' ', 'l', 'o', 'w', 'e', 's', 't']
  • Most frequent pair: 'l' + 'o' → merge into 'lo'
  • Next: 'lo' + 'w' → merge into 'low'
  • Continue until vocabulary is complete

Different Tokenizers, Different Counts

Each model family uses its own tokenizer:

ModelTokenizerVocabulary Size
GPT-4, GPT-3.5cl100k_base~100,000 tokens
GPT-3p50k_base~50,000 tokens
Claude 3Claude tokenizer~100,000 tokens
Llama 2/3SentencePiece~32,000 tokens
GeminiSentencePiece variant~256,000 tokens

This means the same text can have different token counts across models:

Text: "Artificial intelligence is transforming industries."

GPT-4:     6 tokens
Claude 3:  7 tokens
Llama 2:   8 tokens

When planning API costs or context usage, always use the specific tokenizer for your chosen model.

Token Counting Rules of Thumb

While exact counts require the actual tokenizer, these approximations help with quick estimates:

English Text

  • 1 token ≈ 4 characters (including spaces)
  • 1 token ≈ 0.75 words
  • 100 tokens ≈ 75 words
  • 1,000 words ≈ 1,333 tokens

Code

  • Code typically uses more tokens per line than prose
  • Variable names, syntax, and whitespace all consume tokens
  • A 100-line Python function might be 500-800 tokens

Special Cases

  • Numbers: Each digit is often a separate token ("2024" = 4 tokens)
  • URLs: Very token-heavy due to punctuation and special characters
  • JSON: Brackets, colons, and quotes add up quickly
  • Non-English: Some languages (Chinese, Japanese) may use more tokens per character

Context Windows Explained

The context window is your token budget for a conversation—it includes everything the model can "see" at once:

Context Window = Input Tokens + Output Tokens

Current Context Window Sizes

ModelContext WindowApproximate Pages
GPT-4 Turbo128,000 tokens~300 pages
GPT-48,192 tokens~20 pages
Claude 3 Opus200,000 tokens~500 pages
Claude 3 Sonnet200,000 tokens~500 pages
Gemini 1.5 Pro1,000,000 tokens~2,500 pages
Llama 3 70B8,192 tokens~20 pages

Context Window Management

When your conversation exceeds the context window, you have several options:

  1. Truncation: Remove older messages from the conversation
  2. Summarization: Condense earlier context into a summary
  3. RAG (Retrieval): Fetch only relevant portions of large documents
  4. Chunking: Process documents in segments with overlap

Practical Token Counting

Using Our Token Counter Tool

The easiest way to count tokens accurately is using a dedicated tool. Our LLM Token Counter supports multiple models and shows:

  • Exact token count for your text
  • Cost estimates based on current API pricing
  • Context window usage percentage
  • Model comparisons

Programmatic Token Counting

For developers, here's how to count tokens in code:

Python with tiktoken (OpenAI models):

import tiktoken

def count_tokens(text: str, model: str = "gpt-4") -> int:
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

text = "Hello, how are you today?"
tokens = count_tokens(text)
print(f"Token count: {tokens}")  # Output: Token count: 7

Python with transformers (Llama, open models):

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
text = "Hello, how are you today?"
tokens = tokenizer.encode(text)
print(f"Token count: {len(tokens)}")

API Response Token Counts

Most LLM APIs return token usage in responses:

{
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 128,
    "total_tokens": 184
  }
}

Track these values to monitor actual usage against estimates.

Tokens and Pricing

API pricing is directly tied to tokens, typically charged per 1,000 or 1 million tokens:

Current Pricing Examples (as of 2026)

ModelInput (per 1M tokens)Output (per 1M tokens)
GPT-4 Turbo$10.00$30.00
GPT-4o$5.00$15.00
Claude 3 Opus$15.00$75.00
Claude 3 Sonnet$3.00$15.00
Claude 3 Haiku$0.25$1.25
Llama 3 70B (via API)$0.70$0.90

Cost Calculation Example

Processing a 10,000-word document (~13,333 tokens input) and generating a 500-word summary (~667 tokens output) with GPT-4 Turbo:

Input cost:  13,333 ÷ 1,000,000 × $10.00 = $0.13
Output cost: 667 ÷ 1,000,000 × $30.00 = $0.02
Total: $0.15 per document

At scale (10,000 documents/month): $1,500/month

Understanding token economics helps you budget accurately and choose cost-effective models.

Optimizing Token Usage

Write Concise Prompts

Every word in your prompt costs tokens. Compare:

Verbose (32 tokens):

I would really appreciate it if you could please help me by
summarizing the following article for me in a concise manner.

Concise (11 tokens):

Summarize this article in 3 bullet points:

Use System Prompts Efficiently

System prompts are included with every message. Keep them focused:

Inefficient:

You are a helpful AI assistant. You should always be polite,
professional, and thorough in your responses. You have expertise
in many areas including technology, science, business, and more.

Efficient:

You are a technical writer. Be concise and accurate.

Leverage Structured Output

Request specific formats to reduce unnecessary tokens:

Return JSON only: {"summary": "...", "key_points": [...]}

Batch Similar Requests

Instead of multiple API calls, batch related queries:

Analyze these 5 reviews and return sentiment for each:
1. [review 1]
2. [review 2]
...

Common Tokenization Pitfalls

Surprising Token Counts

Some text is surprisingly token-heavy:

  • Whitespace: Multiple spaces or tabs may tokenize separately
  • Special characters: Emoji can be 2-4 tokens each
  • Base64/encoded data: Extremely token-inefficient
  • Repetition: Repeated text isn't compressed

Language Differences

Non-English languages often require more tokens:

LanguageTokens per 1000 characters
English~250 tokens
Spanish~280 tokens
Chinese~350 tokens
Japanese~400 tokens
Arabic~320 tokens

Factor this into multilingual applications.

Code Tokenization

Code tokenizes differently than prose:

# This function might be 15-20 tokens
def calculate_total(items):
    return sum(item.price for item in items)

Variable names, operators, and syntax all contribute. Minified code isn't necessarily fewer tokens—meaningful names and whitespace don't dramatically increase token count.

Conclusion

Tokens are the currency of large language models. Understanding how tokenization works empowers you to:

  • Estimate costs before committing to API usage
  • Optimize prompts for efficiency without sacrificing quality
  • Choose appropriate models based on context needs and budget
  • Debug unexpected behavior when token limits are exceeded

As AI becomes more integrated into applications, token literacy becomes a valuable skill for developers, product managers, and anyone working with LLMs.

Ready to count tokens for your specific use case? Try our LLM Token Counter to get exact counts for GPT-4, Claude, Llama, and other popular models.

Frequently Asked Questions

Find answers to common questions

A token is the basic unit of text that large language models process. Tokens can be whole words, parts of words (subwords), punctuation marks, or even individual characters. For example, the word "tokenization" might be split into "token" + "ization" as two separate tokens. Most English words are 1-2 tokens, while complex or uncommon words may be 3-4 tokens.

Let's turn this knowledge into action

Get a free 30-minute consultation with our experts. We'll help you apply these insights to your specific situation.