MTEB: Massive Text Embedding Benchmark
Gecko: Versatile Text Embeddings Distilled from Large Language Models
SONAR
Don’t use cosine similarity carelessly
Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search
The Best Way to Use Text Embeddings Portably is With Parquet and Polars
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models