Embeddings & vectors
How do you pick an embedding model?
Embedding quality varies by task and language, so models are benchmarked. The Massive Text Embedding Benchmark (MTEB) finds that no single method dominates across all tasks, so the right embedding model depends on what you are doing with it.
Embeddings differ in quality by task and by language, so they are benchmarked. The Massive Text Embedding Benchmark (MTEB) spans many task types and languages, and its headline finding is a useful warning against shopping for a single winner: “no particular text embedding method dominates across all tasks.”[1] The right embedding model depends on what you are doing with it.
Embedding models & benchmarks
- OpenAI Embeddings ↗
Hosted embedding models with a clear guide; recommends cosine similarity on length-1 vectors.
- Cohere Embed ↗
Hosted embeddings with a documented semantic-search workflow.
- MTEB ↗
Open benchmark for comparing embedding models across tasks and languages.
References
- MTEB: Massive Text Embedding Benchmark — Muennighoff et al., EACL 2023