The library
Papers, in plain language
The primary sources this wiki is built on. Each entry is the short version: what the paper introduced and why it matters, with a link to read the original. Every link here is a source the wiki actually cites.
- 2013Efficient Estimation of Word Representations in Vector Space (word2vec) ↗
Showed you could learn useful word vectors cheaply at scale, and that the resulting space captures meaning: similar words sit close together, and relationships show up as consistent directions. The seed of modern embeddings.
- 2014Neural Machine Translation by Jointly Learning to Align and Translate ↗
Introduced attention. Instead of squeezing a whole sentence into one fixed vector, the model learns to "soft-search" the source for the parts relevant to each output word. This idea became the core of the transformer three years later.
- 2017Attention Is All You Need ↗
The transformer. It dropped recurrence entirely and built a model on attention alone, so every token can look at every other in parallel. It trained faster and scaled better than anything before it, and it underlies essentially every modern LLM.
- 2019Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks ↗
Made it practical to embed whole sentences so they can be compared by cosine similarity. Finding the most similar pair in 10,000 sentences dropped from 65 hours to about 5 seconds, which is what made large-scale semantic search feasible.
- 2020Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks ↗
Coined RAG. It paired a generative model with a searchable index of documents, retrieving relevant passages and conditioning the answer on them. The result was more factual and could draw on knowledge outside the model’s training data.
- 2022MTEB: Massive Text Embedding Benchmark ↗
A broad benchmark for embedding models across many tasks and languages. Its headline finding is a useful caution: no single embedding model wins everywhere, so the right choice depends on what you are doing.
- 2023Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection ↗
Formalized indirect prompt injection: hiding instructions in data a model later reads (a web page, a document) so an attacker who never talks to the model can still control it. Demonstrated real attacks against a production system.
- 2024Building Effective Agents ↗
The reference guide on the difference between workflows (predefined paths) and agents (the model directs itself), with the practical advice to use the simplest thing that works and only add autonomy when the task needs it.
- 2025Measuring AI Ability to Complete Long Tasks ↗
Measured how long a task an AI can complete reliably, finding models far more dependable on short tasks than long ones, with that time horizon improving over time. A grounded way to think about why agents still need guardrails.
- 2025Defeating Prompt Injections by Design (CaMeL) ↗
A by-design defense against prompt injection: split a privileged planner model from a quarantined model that reads untrusted data, and track data provenance so tainted data cannot trigger dangerous actions. Strong results, but not a complete fix.