Question 1

Are these the embedding models themselves or tools to run them?

Accepted Answer

They are the tooling, not the model weights. Sentence Transformers is a library for computing and training embeddings, Text Embeddings Inference and Infinity are servers that put a model behind an API, and Model2Vec is a technique for shrinking a model. All of them load actual model checkpoints, most often from Hugging Face. So you pick a tool for how you will run embeddings and a separate model for how well it retrieves on your data.

Question 2

Do open source embedding tools require a GPU?

Accepted Answer

Not always. Smaller models run acceptably on CPU for batch jobs or low-volume search, and Text Embeddings Inference supports CPU and Apple Metal alongside CUDA. Model2Vec goes further, distilling a model for up to 500x faster CPU inference, which makes CPU-only or edge use realistic. A GPU matters mainly for high throughput or frequent re-embedding. The real question is tokens per second under your chunk and batch sizes, so test with production-like text.

Question 3

What is the difference between dense, sparse, and hybrid retrieval?

Accepted Answer

Dense retrieval compares embedding vectors, so it matches meaning across different wording. Sparse retrieval weights tokens and handles exact terms, rare identifiers, and names well. Hybrid combines both signals and is often more robust for catalogs, legal text, logs, or code full of exact tokens. Sentence Transformers can generate both dense embeddings and sparse embeddings, along with cross-encoder rerankers, so you can build a hybrid pipeline without stitching together unrelated libraries.

Question 4

How should I evaluate an embedding model for my own search?

Accepted Answer

Use your own queries and documents, even if the test set starts small. Create expected matches, near misses, and irrelevant items that share vocabulary, then measure recall at the number of results your UI or RAG pipeline actually reads. Inspect failures by hand, because a model that retrieves broadly related documents can still miss exact identifiers, dates, or domain phrasing. Benchmark leaderboards are a starting filter, not a substitute for testing on your corpus.

Question 5

What has to happen when I switch embedding models?

Accepted Answer

Vectors from different models are not compatible, so switching is a rebuild, not a swap. You re-embed the documents, rebuild the index, and often retune thresholds, rerankers, and hybrid weights, because distance scores no longer mean the same thing. Keep the old index available during validation so you can compare query behavior before cutting traffic over. This is exactly why fast serving with Text Embeddings Inference or Infinity makes a model change far less painful.

Question 6

Can I make embeddings run fast on CPU or at the edge?

Accepted Answer

Yes, and this is Model2Vec's purpose. It distills any sentence transformer into a small static model, reducing size by up to 50x and CPU inference time by up to 500x for a modest accuracy drop, and distillation can run from just a vocabulary and a model with no dataset. For laptops, phones, or CPU-only servers, a distilled static model is usually more realistic than a full transformer, though you should measure the accuracy tradeoff on your data.

Question 7

What happens to my vectors if an embedding tool stops being maintained?

Accepted Answer

Existing vectors keep working as long as you have pinned the weights, tokenizer, license rights, and deployment environment. The real risks are dependency security fixes drying up and weaker performance on new kinds of data, not sudden breakage. Keep a reproducible build and an exportable corpus pipeline so you can re-embed with another tool, and avoid preprocessing that only exists inside one service wrapper. A pinned, documented setup can run safely well past the last release.

Open Source Embedding Models

Sentence Transformers

Text Embeddings Inference

Infinity

Model2Vec

Our picks

Framework, server, or distiller: pick for how you will run embeddings

Related categories

Frequently asked questions