Words to sentences — Physea Wiki

Early embeddings were per-word, but real applications need one vector for a whole sentence or passage. Sentence-BERT adapted BERT with siamese and triplet networks to derive comparable sentence embeddings, cutting similarity search from hours to seconds.

Early embeddings were per-word. Real applications need a vector for a whole sentence or passage. Sentence-BERT adapted BERT for exactly this, using “siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity,” and the speedup was dramatic: finding the most similar pair in a collection of 10,000 sentences dropped “from 65 hours with BERT / RoBERTa to about 5 seconds.”^[1]

References

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks — Reimers & Gurevych, EMNLP 2019

How do embeddings move from single words to whole sentences?

References