Training vs inference
What is the difference between training a model and using it?
Training is when a model learns: it adjusts its internal numbers, called weights, by measuring its errors on data. Inference is when you use the finished model on new input, with the weights held fixed.
A model lives in two modes. The first is training, where it learns. The model makes a prediction, an error is measured against the right answer, and that error is used to nudge the model’s internal numbers, called weights, so the next prediction is a little better. Repeat this billions of times and the weights settle into values that capture patterns in the data. The math that does the nudging is gradient descent: training works by “computing predictions, measuring errors using a loss function, and updating parameters via optimization algorithms like stochastic gradient descent.”[1]
The second mode is inference, where you use the finished model. You give it new input, it produces an answer, and that is the end of it. The weights do not change. During inference “the model’s parameters are fixed, and it processes inputs through a forward pass without updating weights.”[1]
This split has a practical consequence. Training a large model is a heavy, one-time job that can take weeks on many machines. Inference is comparatively cheap and happens every time someone uses the model, which is why a lot of engineering effort goes into making it fast.