Python toolkit for turning sentence transformers into small, fast static embedding models
- Stars2.1k
- Forks121
- Open Issues3
MIT
- Python
- Jupyter Notebook
- Makefile

About Model2Vec
Model2Vec is a technique and Python package for turning any sentence transformer into a small, fast static embedding model. It targets embedding use cases where full sentence transformer inference is too large or slow, reducing model size by up to 50x and CPU inference time by up to 500x with a small drop in performance.
It works by forwarding a vocabulary through a sentence transformer to create static token embeddings, then applying post-processing and optional pre-training. The StaticModel API can load models from the Hugging Face Hub with from_pretrained, encode text, and return token embedding sequences. Distillation can run without a dataset, using only a vocabulary and model.
Model2Vec includes pre-trained models on Hugging Face, including potion-base-32M and multilingual, retrieval, and smaller base variants. The base package depends mainly on numpy, with training extras for fine-tuning classification models on top of Model2Vec models. It is licensed under MIT.
Key features
- Distills sentence transformers into static embedding models
- Encodes text and token sequences with StaticModel
- Dataset-free distillation using a vocabulary and model
- Fine-tunes classification models on Model2Vec embeddings
- Loads and pushes models through the Hugging Face Hub
Details
- First released
- 2024
- Language
- Python
- Model size
- Up to 50x smaller
- Inference
- Up to 500x faster on CPU
- Dependencies
- Base package mainly numpy
- License
- MIT
