16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Statistical Singing Voice Conversion Based on Direct Waveform Modification with Global Variance

Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura

NAIST, Japan

This paper presents techniques to improve the quality of voices generated through statistical singing voice conversion with direct waveform modification based on spectrum differential (DIFFSVC). The DIFFSVC method makes it possible to convert singing voice characteristics of a source singer into those of a target singer without using vocoder-based waveform generation. However, quality of the converted singing voice still degrades compared to that of a natural singing voice due to various factors, such as the over-smoothing of the converted spectral parameter trajectory. To alleviate this over-smoothing, we propose a technique to restore the global variance of the converted spectral parameter trajectory within the framework of the DIFFSVC method. We also propose another technique to specifically avoid over-smoothing at unvoiced frames. Results of subjective and objective evaluations demonstrate that the proposed techniques significantly improve speech quality of the converted singing voice while preserving the conversion accuracy of singer identity compared to the conventional DIFFSVC.

Full Paper

Bibliographic reference.  Kobayashi, Kazuhiro / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2015): "Statistical singing voice conversion based on direct waveform modification with global variance", In INTERSPEECH-2015, 2754-2758.