Picking a model — Physea Wiki

Embedding quality varies by task and language, so models are benchmarked. The Massive Text Embedding Benchmark (MTEB) finds that no single method dominates across all tasks, so the right embedding model depends on what you are doing with it.

Embeddings differ in quality by task and by language, so they are benchmarked. The Massive Text Embedding Benchmark (MTEB) spans many task types and languages, and its headline finding is a useful warning against shopping for a single winner: “no particular text embedding method dominates across all tasks.”^[1] The right embedding model depends on what you are doing with it.

Embedding models & benchmarks

OpenAI Embeddings ↗
Hosted embedding models with a clear guide; recommends cosine similarity on length-1 vectors.
Cohere Embed ↗
Hosted embeddings with a documented semantic-search workflow.
MTEB ↗
Open benchmark for comparing embedding models across tasks and languages.

References

MTEB: Massive Text Embedding Benchmark — Muennighoff et al., EACL 2023

How do you pick an embedding model?

Embedding models & benchmarks

References