How Large Language Models Work
See how LLMs break text into tokens and predict what comes next. These two concepts explain almost everything about how AI generates text.
Token Visualiser
Type or edit the text below to see how an LLM would split it into tokens. Each coloured block is one token. This is a simplified simulation — real tokenisers use byte-pair encoding (BPE).
Why this matters: Tokens determine cost, speed, and context window usage. A 128K token context window might hold around 100,000 words — but acronyms, technical jargon, and non-English text often use more tokens per word than plain English.
Next-Token Predictor
See how the model predicts one token at a time. Click on a token to select it, and watch the probabilities update for the next position. Adjust the temperature to see how randomness affects selection.
Key insight: This is all an LLM does — predict the next token, one at a time. It does not plan ahead, it does not understand meaning, and it does not verify facts. Every response you read was built this way: one probabilistic choice after another.
Try It Yourself
Explore real tokenisation with OpenAI's official tool. Paste in Defence-specific text and see how acronyms and jargon tokenise differently.