Exemplar-Based Spectral Detail Compensation for Voice Conversion

Yu-Huai Peng, Hsin-Te Hwang, Yichiao Wu, Yu Tsao, Hsin-Min Wang


Most voice conversion (VC) systems are established under the vocoder-based VC framework. When performing spectral conversion (SC) under this framework, the low-dimensional spectral features, such as mel-ceptral coefficients (MCCs), are often adopted to represent the high-dimensional spectral envelopes. The joint density Gaussian mixture model (GMM)-based SC method with the STRAIGHT vocoder is a well-known representative. Although it is reasonably effective, the loss of spectral details in the converted spectral envelopes inevitably deteriorates speech quality and similarity. To overcome this problem, we propose a novel exemplar-based spectral detail compensation method for VC. In the offline stage, the paired dictionaries of source spectral envelopes and target spectral details are constructed. In the online stage, the locally linear embedding (LLE) algorithm is applied to predict the target spectral details from the source spectral envelopes and then, the predicted spectral details are used to compensate the converted spectral envelopes obtained by a baseline GMM-based SC method with the STRAIGHT vocoder. Experimental results show that the proposed method can notably improve the baseline system in terms of objective and subjective tests.


 DOI: 10.21437/Interspeech.2018-1662

Cite as: Peng, Y., Hwang, H., Wu, Y., Tsao, Y., Wang, H. (2018) Exemplar-Based Spectral Detail Compensation for Voice Conversion. Proc. Interspeech 2018, 486-490, DOI: 10.21437/Interspeech.2018-1662.


@inproceedings{Peng2018,
  author={Yu-Huai Peng and Hsin-Te Hwang and Yichiao Wu and Yu Tsao and Hsin-Min Wang},
  title={Exemplar-Based Spectral Detail Compensation for Voice Conversion},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={486--490},
  doi={10.21437/Interspeech.2018-1662},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1662}
}