Hallucination — Physea Wiki

A hallucination is a fluent, confident statement that is simply wrong. It happens because the way models are trained and graded rewards guessing over saying 'I don't know.'

A hallucination is when a model states something that sounds right but is false. The tricky part is the tone: the wrong answer arrives with the same calm confidence as a correct one, so there is no built-in signal telling you which is which.

A 2025 paper from OpenAI and Georgia Tech offers a clear explanation. It compares a model to a student facing a hard exam question: when unsure, both tend to guess, “producing plausible yet incorrect statements instead of admitting uncertainty.”^[1] The authors argue this is not a mysterious flaw but a predictable result of how models are built and graded. Most benchmarks score an answer as right or wrong and give no credit for saying “I don’t know,” so a model that guesses scores better on average than one that admits doubt. Over millions of training examples, the model learns to bluff.^[1]

The paper gives a concrete demonstration. Asked for a specific researcher’s birthday with the instruction to answer only if known, a leading model produced three different wrong dates across three attempts.^[1] It had no business answering at all, but guessing is the habit it was trained into.

The practical lesson is to treat any specific factual claim, especially names, dates, numbers, and citations, as unverified until you check it. The fix is not to expect the model to stop guessing on its own, but to give it the facts to work from (see retrieval) and to verify anything that matters.

References

Why Language Models Hallucinate — OpenAI / Georgia Tech (Kalai, Nachum, Vempala, Zhang)

Why does AI confidently make things up?

References