Embeddings & vectors
How do embeddings power semantic search and RAG?
Semantic search embeds both the query and the documents and ranks by vector similarity, so it matches meaning rather than keywords. The same machinery underpins retrieval-augmented generation, which fetches relevant passages from a vector index and conditions a model's answer on them.
This is where embeddings earn their keep. Semantic search embeds both the query and the documents and ranks by vector similarity, so a search matches meaning rather than keywords. Cohere frames the contrast directly: semantic search “solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles to capture the context or meaning of a piece of text.”[1] It lets “how do I reset my password” match a page titled “recovering account access.”
The same machinery underpins retrieval-augmented generation. The RAG paper paired a generative model with “a dense vector index of Wikipedia, accessed with a pre-trained neural retriever,” fetching relevant passages and conditioning the answer on them.[2] The RAG and retrieval page walks through that pipeline end to end.
References
- Semantic Search with Embeddings — Cohere
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al., NeurIPS 2020