PhyseaWiki How AI actually works Papers physea.ai →

RAG & retrieval

What is RAG and what are its two phases?

Retrieval-augmented generation (RAG) lets a model answer from your own data by fetching relevant text and adding it to the prompt. It always runs in two phases: index your documents once offline, then retrieve the closest chunks at question time.

Last updated 2026-06-15 · Physea Labs

A model knows only what it absorbed during training, and that knowledge is frozen at a cutoff date. Retrieval-augmented generation (RAG) is the standard way around that limit: before the model answers, you fetch relevant text from your own data and add it to the prompt. The term and the method come from Lewis and colleagues in 2020, who paired a model’s built-in (parametric) memory with an external, searchable (non-parametric) memory.[1]

RAG always has two phases, and keeping them separate is the key to understanding it.

Index, once, offline. You take your documents, split them into chunks, turn each chunk into a vector with an embedding model, and store those vectors in an index. This is preparation; it happens before any question.

Retrieve, at question time. When a user asks something, you embed the question with the same model, search the index for the closest chunks, add them to the prompt, and let the model generate an answer grounded in that text.[2]

INDEX ONCE (OFFLINE) Documents Chunk Embed Vector index RETRIEVE (AT QUESTION TIME) Question Embed Top-K search reads index Augment Answer
RAG has two phases: index your documents once, then retrieve the closest chunks at question time.

RAG building blocks

  • FAISS

    Meta's open-source library for fast similarity search over vectors; a common local starting point.

  • Pinecone

    Managed vector database with a widely-cited RAG explainer.

  • LlamaIndex

    Framework for building the indexing and retrieval pipeline around your data.

  • MTEB

    Benchmark for comparing embedding models before you pick one.

References

  1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020) — arXiv
  2. What is Retrieval-Augmented Generation (RAG)? — Pinecone