This paper presents techniques to improve the quality of voices generated through statistical singing voice conversion with direct waveform modification based on spectrum differential (DIFFSVC). The DIFFSVC method makes it possible to convert singing voice characteristics of a source singer into those of a target singer without using vocoder-based waveform generation. However, quality of the converted singing voice still degrades compared to that of a natural singing voice due to various factors, such as the over-smoothing of the converted spectral parameter trajectory. To alleviate this over-smoothing, we propose a technique to restore the global variance of the converted spectral parameter trajectory within the framework of the DIFFSVC method. We also propose another technique to specifically avoid over-smoothing at unvoiced frames. Results of subjective and objective evaluations demonstrate that the proposed techniques significantly improve speech quality of the converted singing voice while preserving the conversion accuracy of singer identity compared to the conventional DIFFSVC.
Bibliographic reference. Kobayashi, Kazuhiro / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2015): "Statistical singing voice conversion based on direct waveform modification with global variance", In INTERSPEECH-2015, 2754-2758.