The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016

Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda

This paper presents the NU-NAIST voice conversion (VC) system for the Voice Conversion Challenge 2016 (VCC 2016) developed by a joint team of Nagoya University and Nara Institute of Science and Technology. Statistical VC based on a Gaussian mixture model makes it possible to convert speaker identity of a source speaker’ voice into that of a target speaker by converting several speech parameters. However, various factors such as parameterization errors and over-smoothing effects usually cause speech quality degradation of the converted voice. To address this issue, we have proposed a direct waveform modification technique based on spectral differential filtering and have successfully applied it to singing voice conversion where excitation features are not necessary converted. In this paper, we propose a method to apply this technique to a standard voice conversion task where excitation feature conversion is needed. The result of VCC 2016 demonstrates that the NU-NAIST VC system developed by the proposed method yields the best conversion accuracy for speaker identity (more than 70% of the correct rate) and quite high naturalness score (more than 3 of the mean opinion score). This paper presents detail descriptions of the NU-NAIST VC system and additional results of its performance evaluation.

DOI: 10.21437/Interspeech.2016-970

Cite as

Kobayashi, K., Takamichi, S., Nakamura, S., Toda, T. (2016) The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016. Proc. Interspeech 2016, 1667-1671.

author={Kazuhiro Kobayashi and Shinnosuke Takamichi and Satoshi Nakamura and Tomoki Toda},
title={The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016},
booktitle={Interspeech 2016},