Model2Vec

Python toolkit for turning sentence transformers into small, fast static embedding models

Repository activity

Stars2.1k
Forks121
Open Issues3

License

MIT

Languages

Python
Jupyter Notebook
Makefile

Get it:Website GitHub

About Model2Vec

Model2Vec is a technique and Python package for turning any sentence transformer into a small, fast static embedding model. It targets embedding use cases where full sentence transformer inference is too large or slow, reducing model size by up to 50x and CPU inference time by up to 500x with a small drop in performance.

It works by forwarding a vocabulary through a sentence transformer to create static token embeddings, then applying post-processing and optional pre-training. The StaticModel API can load models from the Hugging Face Hub with from_pretrained, encode text, and return token embedding sequences. Distillation can run without a dataset, using only a vocabulary and model.

Model2Vec includes pre-trained models on Hugging Face, including potion-base-32M and multilingual, retrieval, and smaller base variants. The base package depends mainly on numpy, with training extras for fine-tuning classification models on top of Model2Vec models. It is licensed under MIT.

Key features

Distills sentence transformers into static embedding models
Encodes text and token sequences with StaticModel
Dataset-free distillation using a vocabulary and model
Fine-tunes classification models on Model2Vec embeddings
Loads and pushes models through the Hugging Face Hub

Details

First released: 2024
Language: Python
Model size: Up to 50x smaller
Inference: Up to 500x faster on CPU
Dependencies: Base package mainly numpy
License: MIT