OpenVoice

Instant voice cloning model with tone color cloning, style control, and cross-lingual speech generation

Repository activity

Stars36.7k
Forks4.1k
Open Issues305

License

MIT

Languages

Python
Jupyter Notebook

Get it:GitHub Website

About OpenVoice

OpenVoice is an open source instant voice cloning system for generating speech from a short reference audio clip. It focuses on cloning tone color, generating speech in multiple languages and accents, and supporting zero-shot cross-lingual voice cloning when the generated language and reference language are not in the training dataset.

The model provides voice style controls for emotion, accent, rhythm, pauses, and intonation. OpenVoice V2 adds better audio quality through a different training strategy and native multilingual support for English, Spanish, French, Chinese, Japanese, and Korean.

OpenVoice has powered the instant voice cloning feature in myshell.ai since May 2023. Its main contributors come from MIT, Tsinghua University, and MyShell. OpenVoice V1 and V2 are MIT licensed and free for commercial and research use.

Key features

Tone color cloning from short reference audio clips
Speech generation in multiple languages and accents
Emotion, accent, rhythm, pause, and intonation control
Zero-shot cross-lingual voice cloning
Native V2 support for English, Spanish, French, Chinese, Japanese, and Korean

Details

First released: 2023
License: MIT
Commercial use: Free for commercial and research use
Languages: EN · ES · FR · ZH · JA · KO
Style control: Emotion · accent · rhythm · pauses
Contributors: MIT · Tsinghua · MyShell