PhyseaWiki How AI actually works Papers physea.ai →

Evaluating trust

Why do models invent answers instead of saying I don't know?

Models invent plausible answers because the way they are scored rewards guessing over saying I don't know. Like a student on a multiple-choice test, a guess can earn points while an honest blank earns none, so the model learns to guess.

Last updated 2026-06-15 · Physea Labs

A made-up but plausible-sounding answer is usually called a hallucination. The natural question is why a system would produce one at all instead of simply admitting it does not know.

One answer comes from a 2025 paper by researchers at OpenAI. Their argument is that hallucinations are not a mysterious glitch but a predictable result of how models are scored. The comparison they use is an exam: “Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty.”[1] On most benchmarks a guess that happens to be right earns a point, while “I don’t know” earns nothing, so “guessing when uncertain improves test performance.”[1]

Their proposed fix is not another hallucination test but a change to how existing tests are scored, so that a confident wrong answer is penalized more than an honest expression of uncertainty.[1] For someone deciding whether to trust an answer, the takeaway is that the system was trained on incentives that favor a fluent guess. An answer to a question the model may not actually know is exactly where you should slow down and verify.

References

  1. Why Language Models Hallucinate — Kalai, Nachum, Vempala & Zhang, OpenAI / arXiv