How Large Language Models Work

See how LLMs break text into tokens and predict what comes next. These two concepts explain almost everything about how AI generates text.

Token Visualiser Next-Token Predictor OpenAI Tokenizer

Token Visualiser

Type or edit the text below to see how an LLM would split it into tokens. Each coloured block is one token. This is a simplified simulation — real tokenisers use byte-pair encoding (BPE).

Input Text

Tokens

Characters

Chars / Token

Tokens

Click "Tokenise" to see results

Why this matters: Tokens determine cost, speed, and context window usage. A 128K token context window might hold around 100,000 words — but acronyms, technical jargon, and non-English text often use more tokens per word than plain English.

Next-Token Predictor

See how the model predicts one token at a time. Click on a token to select it, and watch the probabilities update for the next position. Adjust the temperature to see how randomness affects selection.

Current Sequence

The capital of Australia is

Temperature 0.7

0.0 — Deterministic 1.0 — Balanced 2.0 — Chaotic

Predicted Next Tokens (click to select)

Key insight: This is all an LLM does — predict the next token, one at a time. It does not plan ahead, it does not understand meaning, and it does not verify facts. Every response you read was built this way: one probabilistic choice after another.

Try It Yourself

Explore real tokenisation with OpenAI's official tool. Paste in Defence-specific text and see how acronyms and jargon tokenise differently.

OpenAI Tokenizer

See exactly how GPT models tokenise any text. Try pasting in a Defence brief and see how many tokens it uses.

Previous: What AI Actually Is Next: Tokens, Context & Temperature