C/C++ speech to text inference for OpenAI's Whisper model with CPU, GPU, and on-device support
- Stars50.7k
- Forks5.7k
- Open Issues1.2k
MIT
- C++
- C
- Cuda

About whisper.cpp
whisper.cpp is a C/C++ implementation of OpenAI's Whisper automatic speech recognition model. It runs speech to text inference without heavy dependencies and can run fully offline on device, built around a small high-level implementation and a plain C-style API.
It supports mixed F16 and F32 precision, integer quantization, zero runtime memory allocations, and voice activity detection. Acceleration is available for Apple Silicon, x86 AVX, POWER VSX, Vulkan, NVIDIA CUDA, AMD ROCm, OpenVINO, Ascend NPU, and Moore Threads GPUs. Supported targets include macOS, iOS, Android, Linux, Windows, WebAssembly, Docker, and Raspberry Pi.
The project ships a CLI, examples, Java bindings, and a Docker image, and is built on top of the ggml machine learning library, with the model logic in whisper.h and whisper.cpp. It is released under the MIT License.
Key features
- Plain C/C++ implementation without dependencies
- CPU-only inference and zero runtime memory allocations
- Mixed F16/F32 precision and integer quantization
- Voice Activity Detection (VAD)
- C-style API and command line tools
Details
- First released
- 2022
- Platforms
- Windows · macOS · Linux · Android · iOS
- Runtime
- CPU · GPU · WebAssembly · Docker
- Inference
- Offline, on-device speech recognition
- Acceleration
- Metal · CUDA · ROCm · Vulkan · OpenVINO
- Quantization
- Integer quantization support
