Whisper

General-purpose speech recognition model for multilingual transcription, translation, and language identification

Repository activity

Stars102.7k
Forks12.5k
Open Issues127

whisper health score - Linux Foundation Insights

License

MIT

Languages

Python

Get it:PyPI GitHub

About Whisper

Whisper is a speech recognition model for transcribing audio and handling speech translation and language identification. It is designed for general-purpose use on diverse audio, and it can replace a traditional speech-processing pipeline with one model.

It uses a Transformer sequence-to-sequence approach trained on multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. The command-line tool can transcribe audio files and set the language, and the Python API can load a model and return text from audio input.

Whisper's code and model weights are released under the MIT License. It installs from a Python package, and the command-line tool requires ffmpeg. Several model sizes are available, trading off accuracy against speed and memory.

Key features

Multilingual speech recognition
Speech translation
Language identification
Voice activity detection
Command-line transcription from audio files

Details

First released: 2022
Platforms: CLI
Deployment: Offline-first
License: MIT
Runtime: Python 3.8-3.11
Dependency: ffmpeg