Vosk logo

Vosk

Offline speech recognition toolkit with streaming transcription, small models, and speaker identification

Repository activity
  • Stars14.8k
  • Forks1.7k
  • Open Issues594
License

Apache-2.0

Languages
  • Jupyter Notebook
  • C++
  • Kotlin
Get it:PyPIGitHub
Vosk screenshot

About Vosk

Vosk is an offline open source speech recognition toolkit for converting speech to text without a network connection. It supports 20+ languages and dialects and is built for uses such as chatbots, smart home devices, virtual assistants, subtitles, and transcription.

It provides continuous large vocabulary transcription, zero-latency streaming API responses, reconfigurable vocabulary, and speaker identification. Speech recognition bindings are available for Python, Java, Node.JS, C#, C++, Rust, Go, and other languages.

Vosk scales from small devices like Raspberry Pi and Android phones up to large server clusters. The models are compact, around 50 MB each, which keeps it usable on constrained hardware while still doing continuous large-vocabulary transcription.

Key features

  • Offline speech recognition
  • 20+ languages and dialects
  • Continuous large vocabulary transcription
  • Streaming API with zero-latency response
  • Reconfigurable vocabulary and speaker identification

Details

First released
2019
Platforms
Android · iOS · Linux
Deployment
offline-first
Languages
Python · Java · Node.JS · C# · C++ · Rust · Go
Models
Small models, about 50 Mb
Scales
Raspberry Pi to big clusters