Offline speech recognition toolkit with streaming transcription, small models, and speaker identification
- Stars14.8k
- Forks1.7k
- Open Issues594
Apache-2.0
- Jupyter Notebook
- C++
- Kotlin

About Vosk
Vosk is an offline open source speech recognition toolkit for converting speech to text without a network connection. It supports 20+ languages and dialects and is built for uses such as chatbots, smart home devices, virtual assistants, subtitles, and transcription.
It provides continuous large vocabulary transcription, zero-latency streaming API responses, reconfigurable vocabulary, and speaker identification. Speech recognition bindings are available for Python, Java, Node.JS, C#, C++, Rust, Go, and other languages.
Vosk scales from small devices like Raspberry Pi and Android phones up to large server clusters. The models are compact, around 50 MB each, which keeps it usable on constrained hardware while still doing continuous large-vocabulary transcription.
Key features
- Offline speech recognition
- 20+ languages and dialects
- Continuous large vocabulary transcription
- Streaming API with zero-latency response
- Reconfigurable vocabulary and speaker identification
Details
- First released
- 2019
- Platforms
- Android · iOS · Linux
- Deployment
- offline-first
- Languages
- Python · Java · Node.JS · C# · C++ · Rust · Go
- Models
- Small models, about 50 Mb
- Scales
- Raspberry Pi to big clusters
