2.1 Module 2 · How LLMs Work

Prediction: The Core Mechanic

Next-word prediction game — see a sentence build one token at a time. Adjust temperature and watch how randomness changes the output.

Next-Word Prediction Game Temperature Playground

Next-Word Prediction Game

Build a sentence one word at a time, just like an LLM does. Pick a starter, then choose the next word from the model's top predictions. Watch the sentence probability drop with each choice.

Choose a starter sentence
Sentence probability:
100%
How this works

LLMs are fundamentally next-token predictors. When you send a prompt, the model doesn't "understand" your request the way a human would. Instead, it calculates probabilities for every possible next word (token) in its vocabulary, picks one, appends it, and repeats.

Each word choice narrows the space of likely continuations. The more words in the sentence, the more constrained the prediction becomes. The probability bars above show the model's confidence in each candidate — higher bars mean the model thinks that word is more likely to follow.

This simple loop — predict, pick, repeat — is the engine behind every conversation, essay, and code snippet an LLM produces. Everything else (fine-tuning, RLHF, system prompts) just shapes which predictions the model makes.

Temperature Playground

Temperature controls how "random" a model's word choices are. Drag the slider to see how the same prompt produces wildly different outputs at different temperature values.

Prompt
"Write a tagline for a coffee shop"
Temperature 0.0
Deterministic Creative Chaotic
Creativity level
Very predictable
Model output

When to use each temperature: Use low temperatures (0.0–0.3) for factual tasks like data extraction, maths, and code where you want reliable, repeatable results. Use medium temperatures (0.5–0.8) for balanced writing tasks. Use high temperatures (1.0+) for brainstorming, creative fiction, or when you explicitly want surprising outputs. Most APIs default to around 0.7–1.0.