Embeddings & similarity

Embeddings turn text into vectors so passages with similar meaning land near each other. Documents and queries must use the same model, and retrieval becomes a nearest-neighbor search for the top-K closest chunks.

The machinery that makes retrieval work is embeddings: turning text into vectors so that passages with similar meaning land near each other in space. You must embed your documents and your query with the same model for the comparison to make sense. How well an embedding model captures meaning is measured by benchmarks like MTEB, the Massive Text Embedding Benchmark.^[1] Retrieval then becomes a nearest-neighbor search: find the top-K chunks closest to the query vector. The Language Model Architecture subject goes deeper on how this representation works.

References

Massive Text Embedding Benchmark (MTEB) — Hugging Face

How do embeddings make RAG retrieval work?

References