RVC WebUI

VITS-based voice conversion web UI for training and running voice models from short audio samples

Repository activity

Stars36k
Forks5.1k
Open Issues739

License

MIT

Languages

Python
Jupyter Notebook
Batchfile

Get it:Website

About RVC WebUI

RVC WebUI is a simple voice conversion framework built on VITS. It is designed to train a voice conversion model from as little as 10 minutes of clean speech and to run conversion through a web interface. The same interface also includes a real-time voice changer.

It uses top1 retrieval to replace input features with training-set features and reduce timbre leakage. It supports model fusion through ckpt-merge, can call UVR5 to separate vocals and accompaniment, and uses the InterSpeech2023 RMVPE pitch extraction algorithm. The project also notes 170 ms end-to-end latency, or 90 ms with ASIO hardware support.

RVC WebUI runs on Python 3.8 or newer and ships Windows and Linux shell and batch launch scripts, plus IPEX support notes for Intel graphics users. It runs entirely as a local training and inference tool, with a hosted demo available for trying it without setup.

Key features

Train voice conversion models from about 10 minutes of speech
Top1 retrieval to reduce timbre leakage
Real-time voice changer interface
Model fusion with ckpt-merge
UVR5 vocal and accompaniment separation

Details

First released: 2023
Platforms: Windows · macOS · Linux · Web
Deployment: self-hostable
Input: Voice data, recommended 10 minutes
Latency: 170 ms end-to-end; 90 ms with ASIO
Framework: VITS