Whisper logo

Whisper

General-purpose speech recognition model for multilingual transcription, translation, and language identification

Repository activity
  • Stars102.7k
  • Forks12.5k
  • Open Issues127
whisper health score - Linux Foundation Insights
License

MIT

Languages
  • Python
Get it:PyPIGitHub
Whisper screenshot

About Whisper

Whisper is a speech recognition model for transcribing audio and handling speech translation and language identification. It is designed for general-purpose use on diverse audio, and it can replace a traditional speech-processing pipeline with one model.

It uses a Transformer sequence-to-sequence approach trained on multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. The command-line tool can transcribe audio files and set the language, and the Python API can load a model and return text from audio input.

Whisper's code and model weights are released under the MIT License. It installs from a Python package, and the command-line tool requires ffmpeg. Several model sizes are available, trading off accuracy against speed and memory.

Key features

  • Multilingual speech recognition
  • Speech translation
  • Language identification
  • Voice activity detection
  • Command-line transcription from audio files

Details

First released
2022
Platforms
CLI
Deployment
Offline-first
License
MIT
Runtime
Python 3.8-3.11
Dependency
ffmpeg