How examples steer — Physea Wiki

Examples in a prompt mostly teach the model the format, the set of possible answers, and the kind of inputs to expect. One study found that even wrong labels in the examples barely hurt results, which says the format does much of the work.

It is tempting to assume an example works by teaching the model the right answer. A 2022 study tested that assumption and got a surprising result. When the researchers kept the examples but scrambled their labels, pairing inputs with random wrong answers, performance barely dropped. As the paper reports, “randomly replacing labels in the demonstrations barely hurts performance” across many tasks and models.^[1]

So if the correct labels are not doing the heavy lifting, what is? The same study found that examples mainly show the model three things: “(1) the label space, (2) the distribution of the input text, and (3) the overall format of the sequence.”^[1] In plainer terms, the examples reveal the set of answers it is allowed to give, the kind of text it will be reading, and the exact shape each answer should take. The examples set the stage, and the model then performs on that stage.

Worth remembering This does not mean you should use wrong labels on purpose. It means the structure and consistency of your examples carry more weight than people expect, so keep their format clean and uniform.

Tool providers give practical advice that lines up with this. Anthropic’s guidance says, “Examples are one of the most reliable ways to steer Claude’s output format, tone, and structure,” and recommends including 3 to 5 examples for best results.^[2] OpenAI’s guide adds that when you supply examples, “try to show a diverse range of possible inputs with the desired outputs,” so the model does not latch onto an accidental pattern.^[3]

Prompting guidance from model providers

Anthropic prompt engineering ↗
Best-practice guide covering examples, structure, and output control.
OpenAI prompt engineering ↗
Guide to few-shot learning and showing diverse input/output examples.

References

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? — Min et al., EMNLP 2022
Prompting best practices — Anthropic
Prompt engineering guide — OpenAI

How do examples in a prompt change what the model does?

Prompting guidance from model providers

References