Key terms
What is inference, and what is fine-tuning?
Inference is what happens every time you use a model: it runs forward on new input with its weights frozen. Fine-tuning is extra training afterward that adjusts those weights to specialize the model for a task.
Inference is the everyday act of using a model. It is “the phase where a trained model processes new, unseen data and returns an output.”[1] Every time you send a prompt and read a reply, that is one inference. The key point is that no learning happens during inference: “the model’s learned weights are fixed, and each input triggers a forward pass to generate outputs.”[1] Training is where the model learned; inference is where it puts that learning to work.
Fine-tuning is extra training applied to an already-trained model so it does better on a narrower job. Google calls this “additional training” that “unlocks an LLM’s practical side,” teaching it on examples specific to a task.[2] Fine-tuning changes the model’s parameters, but it does not make the model bigger: “a fine-tuned model contains the same number of parameters as the foundation LLM.”[2] It starts from the general model and specializes it.
The difference matters because the two are often confused. Adjusting your prompt is not fine-tuning; it changes the input, not the model. Fine-tuning actually edits the model’s numbers and produces a new version. Inference, by contrast, leaves the model untouched and simply runs it.
References
- AI Inference vs Training: Key Differences Explained — DigitalOcean
- LLMs: Fine-tuning, distillation, and prompt engineering — Google for Developers