Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU

Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari


We present a real-time, full-band, online voice conversion (VC) system that uses a single CPU. For practical applications, VC must be high quality and able to perform real-time, online conversion with fewer computational resources. Our system achieves this by combining non-linear conversion with a deep neural network and short-tap, sub-band filtering. We evaluate our system and demonstrate that it 1) achieves the estimated complexity around 2.5 GFLOPS and measures real-time factor (RTF) around 0.5 with a single CPU and 2) can attain converted speech with a 3.4 / 5.0 mean opinion score (MOS) of naturalness.


Cite as: Saeki, T., Saito, Y., Takamichi, S., Saruwatari, H. (2020) Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU. Proc. Interspeech 2020, 1021-1022.


@inproceedings{Saeki2020,
  author={Takaaki Saeki and Yuki Saito and Shinnosuke Takamichi and Hiroshi Saruwatari},
  title={{Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1021--1022}
}