When it helps — Physea Wiki

Step-by-step reasoning pays off on multi-step problems such as math, logic, and planning. On simple recall or lookup questions it adds time and cost without making the answer better.

Step-by-step reasoning is not always worth it. It earns its keep on problems that genuinely take several steps, and it mostly wastes effort on problems that do not.

The original chain-of-thought paper tested it on arithmetic, commonsense, and symbolic reasoning, the kinds of tasks where a person would also need to work through a few stages.^[1] Anthropic points to the same fit, listing mathematical proofs, logical reasoning, and algorithm design as good cases for extended thinking.^[2] OpenAI describes its reasoning models as suited to “complex problem-solving, coding, scientific reasoning, and multi-step workflows.”^[3] The common thread is that the answer cannot be reached in one jump.

The cost is real. Reasoning uses extra tokens and adds time, which means more money and a slower reply. On a simple lookup, a single-fact question, or a short rewrite, that spending buys nothing because there are no intermediate steps to take. For those, a direct answer is both faster and cheaper.

Rule of thumb If you could answer the question in one step yourself, the model probably can too, so skip the reasoning. If you would need scratch paper, let the model use some.

Some newer models try to make this choice for you. Anthropic’s later models added adaptive thinking, where the model decides how much to think based on how hard the question looks, rather than thinking the same fixed amount every time.^[2]

References

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Wei et al., NeurIPS 2022
Building with extended thinking — Anthropic
Reasoning models — OpenAI

When does step-by-step reasoning actually help?

References