Hallucination — Physea Wiki

A model's job is to produce likely text, not verified truth, so it sometimes states false things confidently. The main fixes are letting it admit uncertainty and grounding its answers in supplied sources.

A hallucination is when a model states something false as if it were true. The text reads well and sounds confident, which is exactly what makes it dangerous.

The cause is built into how these systems work. A model is trained to produce the most likely continuation of text, not to check facts against the world. One study frames the problem in terms of how models are trained and graded: the procedures reward guessing over admitting uncertainty, so a model behaves “like students facing hard exam questions” and guesses when unsure, “producing plausible yet incorrect statements” instead of saying it does not know.^[1] If a wrong guess scores the same as silence, guessing is the safer bet for the model.

The most direct fix is to remove the penalty for honesty. Give the model explicit permission to say it is unsure. Anthropic’s guidance lists this first: “Allow Claude to say ‘I don’t know’… This simple technique can drastically reduce false information.”^[2] A line like “If the document does not contain the answer, say so” is often enough.

The second fix is grounding. Instead of asking the model to answer from memory, give it the source material and ask it to base its answer on that. For long documents, the same guidance suggests asking the model to pull exact quotes first, then answer using only those quotes, and to drop any claim it cannot support with a quote.^[2] This keeps the answer tied to text you can check.

Still verify These techniques reduce hallucination; they do not remove it. For anything high-stakes, confirm the facts yourself.

References

Why Language Models Hallucinate — Kalai, Nachum, Vempala & Zhang (OpenAI / Georgia Tech)
Reduce hallucinations — Anthropic

Why do AI models make up facts, and how do I reduce it?

References