Vosk

Offline speech recognition toolkit with streaming transcription, small models, and speaker identification

Repository activity

Stars14.8k
Forks1.7k
Open Issues594

License

Apache-2.0

Languages

Jupyter Notebook
C++
Kotlin

Get it:PyPI GitHub

About Vosk

Vosk is an offline open source speech recognition toolkit for converting speech to text without a network connection. It supports 20+ languages and dialects and is built for uses such as chatbots, smart home devices, virtual assistants, subtitles, and transcription.

It provides continuous large vocabulary transcription, zero-latency streaming API responses, reconfigurable vocabulary, and speaker identification. Speech recognition bindings are available for Python, Java, Node.JS, C#, C++, Rust, Go, and other languages.

Vosk scales from small devices like Raspberry Pi and Android phones up to large server clusters. The models are compact, around 50 MB each, which keeps it usable on constrained hardware while still doing continuous large-vocabulary transcription.

Key features

Offline speech recognition
20+ languages and dialects
Continuous large vocabulary transcription
Streaming API with zero-latency response
Reconfigurable vocabulary and speaker identification

Details

First released: 2019
Platforms: Android · iOS · Linux
Deployment: offline-first
Languages: Python · Java · Node.JS · C# · C++ · Rust · Go
Models: Small models, about 50 Mb
Scales: Raspberry Pi to big clusters