Embedding and more — Physea Wiki

An embedding represents text as a vector of numbers, arranged so similar meanings land near each other. This page also gives short, sourced definitions for a few other terms a newcomer meets often.

An embedding is a way to turn text into numbers a model can compare. OpenAI puts it plainly: “an embedding is a vector (list) of floating point numbers,” and “the distance between two vectors measures their relatedness.”^[1] Items with similar meaning end up close together in that space of numbers, and unrelated items end up far apart. This is the trick behind semantic search and behind systems that find relevant documents to feed a model.

A few more terms come up constantly:

A foundation model is “a very large pre-trained model trained on an enormous and diverse training set.”^[2] It is the broad, general-purpose starting point that more specific models are built from.

Temperature is “a hyperparameter that controls the degree of randomness of a model’s output.”^[2] Lower temperature makes replies more predictable and repetitive; higher temperature makes them more varied and surprising.

A hallucination is “the production of plausible-seeming but factually incorrect output by a generative AI model.”^[2] The text reads confidently but the facts are wrong, which is why checking a model’s claims still matters.

Where these definitions come from

Google ML Glossary ↗
Free, authoritative glossary covering generative AI terms like prompt, temperature, and hallucination.
OpenAI Embeddings guide ↗
Vendor documentation that defines embeddings and shows how vector distance measures relatedness.
Microsoft Learn: tokens ↗
A clear walkthrough of tokens, tokenization, vocabulary, and the context window.

References

Vector embeddings — OpenAI
Machine Learning Glossary: Generative AI — Google for Developers

What is an embedding, and what do the other common terms mean?

Where these definitions come from

References