Open Source Embedding Models
Embeddings are the unglamorous engine under semantic search and RAG, and at scale the bottleneck isn't model quality but throughput - how many vectors per second you can produce before the inference layer, not the database, becomes what you wait on. The open source servers here run the embedding model on your own GPUs at high batch throughput, so you can re-embed a whole corpus or serve live queries without a per-vector API charge metering every document you've ever indexed.

Sentence Transformers
Python framework for embeddings, semantic search, retrieval, reranking, and model fine-tuning

Text Embeddings Inference
Inference server for open source embedding and sequence classification models with Docker backend images
Infinity
High-throughput REST API for serving text embeddings, reranking, CLIP, CLAP, and ColPali models

Model2Vec
Python toolkit for turning sentence transformers into small, fast static embedding models