ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training

Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Yih-Ru Wang, Sin-Horng Chen

In this paper, we propose a discriminative training (DT) method to alleviate the muffled sound effect caused by over smoothing in the Gaussian mixture model (GMM)-based voice conversion (VC). For the conventional GMM-based VC, we often observed a large degree of ambiguities among acoustic classes (generative classes), determined by the source feature vectors for generating the converted feature vectors, causing the "muffled sound" effect on the converted voice. The proposed DT method is applied to refine the parameters in the maximum likelihood (ML)-trained joint density GMM (JDGMM) in the training stage to reduce the ambiguities among acoustic classes (generative classes) to alleviate the muffled sound effect. Experimental results demonstrate that the DT method significantly enhances the discriminative power between acoustic classes (generative classes) in the objective evaluation and effectively alleviates the muffled sound effect in the subjective evaluation.


doi: 10.21437/Interspeech.2013-668

Cite as: Hwang, H.-T., Tsao, Y., Wang, H.-M., Wang, Y.-R., Chen, S.-H. (2013) Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training. Proc. Interspeech 2013, 3062-3066, doi: 10.21437/Interspeech.2013-668

@inproceedings{hwang13_interspeech,
  author={Hsin-Te Hwang and Yu Tsao and Hsin-Min Wang and Yih-Ru Wang and Sin-Horng Chen},
  title={{Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3062--3066},
  doi={10.21437/Interspeech.2013-668}
}