Why models guess — Physea Wiki

Models invent plausible answers because the way they are scored rewards guessing over saying I don't know. Like a student on a multiple-choice test, a guess can earn points while an honest blank earns none, so the model learns to guess.

A made-up but plausible-sounding answer is usually called a hallucination. The natural question is why a system would produce one at all instead of simply admitting it does not know.

One answer comes from a 2025 paper by researchers at OpenAI. Their argument is that hallucinations are not a mysterious glitch but a predictable result of how models are scored. The comparison they use is an exam: “Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty.”^[1] On most benchmarks a guess that happens to be right earns a point, while “I don’t know” earns nothing, so “guessing when uncertain improves test performance.”^[1]

Their proposed fix is not another hallucination test but a change to how existing tests are scored, so that a confident wrong answer is penalized more than an honest expression of uncertainty.^[1] For someone deciding whether to trust an answer, the takeaway is that the system was trained on incentives that favor a fluent guess. An answer to a question the model may not actually know is exactly where you should slow down and verify.

References

Why Language Models Hallucinate — Kalai, Nachum, Vempala & Zhang, OpenAI / arXiv

Why do models invent answers instead of saying I don't know?

References