The library

Papers, in plain language

The primary sources this wiki is built on. Each entry is the short version: what the paper introduced and why it matters, with a link to read the original. Every link here is a source the wiki actually cites.

2013

Efficient Estimation of Word Representations in Vector Space (word2vec) ↗
Mikolov, Chen, Corrado, Dean · arXiv · Retrieval & embeddings

Showed you could learn useful word vectors cheaply at scale, and that the resulting space captures meaning: similar words sit close together, and relationships show up as consistent directions. The seed of modern embeddings.
2014

Neural Machine Translation by Jointly Learning to Align and Translate ↗
Bahdanau, Cho, Bengio · arXiv (ICLR 2015) · Architecture

Introduced attention. Instead of squeezing a whole sentence into one fixed vector, the model learns to "soft-search" the source for the parts relevant to each output word. This idea became the core of the transformer three years later.
2017

Attention Is All You Need ↗
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin · arXiv (NeurIPS 2017) · Architecture

The transformer. It dropped recurrence entirely and built a model on attention alone, so every token can look at every other in parallel. It trained faster and scaled better than anything before it, and it underlies essentially every modern LLM.
2019

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks ↗
Reimers, Gurevych · EMNLP-IJCNLP · Retrieval & embeddings

Made it practical to embed whole sentences so they can be compared by cosine similarity. Finding the most similar pair in 10,000 sentences dropped from 65 hours to about 5 seconds, which is what made large-scale semantic search feasible.
2020

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks ↗
Lewis et al. · NeurIPS · Retrieval & embeddings

Coined RAG. It paired a generative model with a searchable index of documents, retrieving relevant passages and conditioning the answer on them. The result was more factual and could draw on knowledge outside the model’s training data.
2022

MTEB: Massive Text Embedding Benchmark ↗
Muennighoff, Tazi, Magne, Reimers · arXiv (EACL 2023) · Evaluation

A broad benchmark for embedding models across many tasks and languages. Its headline finding is a useful caution: no single embedding model wins everywhere, so the right choice depends on what you are doing.
2023

Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection ↗
Greshake, Abdelnabi, Mishra, Endres, Holz, Fritz · arXiv (AISec) · Safety

Formalized indirect prompt injection: hiding instructions in data a model later reads (a web page, a document) so an attacker who never talks to the model can still control it. Demonstrated real attacks against a production system.
2024

Building Effective Agents ↗
Anthropic · Anthropic (essay) · Agents

The reference guide on the difference between workflows (predefined paths) and agents (the model directs itself), with the practical advice to use the simplest thing that works and only add autonomy when the task needs it.
2025

Measuring AI Ability to Complete Long Tasks ↗
METR · METR (study) · Evaluation

Measured how long a task an AI can complete reliably, finding models far more dependable on short tasks than long ones, with that time horizon improving over time. A grounded way to think about why agents still need guardrails.
2025

Defeating Prompt Injections by Design (CaMeL) ↗
Debenedetti et al., Google DeepMind · arXiv · Safety

A by-design defense against prompt injection: split a privileged planner model from a quarantined model that reads untrusted data, and track data provenance so tainted data cannot trigger dangerous actions. Strong results, but not a complete fix.