Chatterbox

Open-source text-to-speech models with multilingual voice cloning and built-in watermarking

Repository activity

Stars25.1k
Forks3.3k
Open Issues341

License

MIT

Languages

Python

Get it:Website GitHub

About Chatterbox

Chatterbox is a family of open-source text-to-speech models from Resemble AI. It turns text into speech for voice cloning, multilingual narration, and low-latency voice agents.

The multilingual model covers Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, and Chinese. Chatterbox Turbo is a 350M parameter model with native paralinguistic tags such as [cough], [laugh], and [chuckle], and it generates audio from a reference clip.

Chatterbox embeds Resemble AI's PerTh perceptual watermarker in every generated audio file, designed to survive MP3 compression and common editing while staying imperceptible. The code and models are published by Resemble AI; you can install it with pip as chatterbox-tts or run it from source for local, offline generation.

Key features

Multilingual TTS across 23 listed languages
Voice cloning from a reference audio clip
Turbo model with native paralinguistic tags
PerTh watermarking in generated audio
Single Language Pack for language-specific finetunes

Details

First released: 2025
Platforms: Web · CLI
Deployment: self-hostable · offline-first
Watermarking: PerTh perceptual threshold
Model size: 0.5B · Turbo 350M
Languages: 23 listed languages