VITS-based voice conversion web UI for training and running voice models from short audio samples
- Stars36k
- Forks5.1k
- Open Issues739
MIT
- Python
- Jupyter Notebook
- Batchfile

About RVC WebUI
RVC WebUI is a simple voice conversion framework built on VITS. It is designed to train a voice conversion model from as little as 10 minutes of clean speech and to run conversion through a web interface. The same interface also includes a real-time voice changer.
It uses top1 retrieval to replace input features with training-set features and reduce timbre leakage. It supports model fusion through ckpt-merge, can call UVR5 to separate vocals and accompaniment, and uses the InterSpeech2023 RMVPE pitch extraction algorithm. The project also notes 170 ms end-to-end latency, or 90 ms with ASIO hardware support.
RVC WebUI runs on Python 3.8 or newer and ships Windows and Linux shell and batch launch scripts, plus IPEX support notes for Intel graphics users. It runs entirely as a local training and inference tool, with a hosted demo available for trying it without setup.
Key features
- Train voice conversion models from about 10 minutes of speech
- Top1 retrieval to reduce timbre leakage
- Real-time voice changer interface
- Model fusion with ckpt-merge
- UVR5 vocal and accompaniment separation
Details
- First released
- 2023
- Platforms
- Windows · macOS · Linux · Web
- Deployment
- self-hostable
- Input
- Voice data, recommended 10 minutes
- Latency
- 170 ms end-to-end; 90 ms with ASIO
- Framework
- VITS
