ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Joint spectral distribution modeling using restricted boltzmann machines for voice conversion

Ling-Hui Chen, Zhen-Hua Ling, Yan Song, Li-Rong Dai

This paper presents a new spectral modeling and conversion method for voice conversion. In contrast to the conventional Gaussian mixture model (GMM) based methods, we use restricted Boltzmann machines (RBMs) as probability density models to model the joint distributions of source and target spectral features. The Gaussian distribution in each mixture of GMM is replaced by an RBM, which can better capture the inter-dimensional and interspeaker correlations within the joint spectral features. Spectral conversion is performed by the maximum conditional output probability criterion. Our experimental results show that the similarity and naturalness of the proposed method are significantly improved comparing with the conventional GMM based method.


doi: 10.21437/Interspeech.2013-666

Cite as: Chen, L.-H., Ling, Z.-H., Song, Y., Dai, L.-R. (2013) Joint spectral distribution modeling using restricted boltzmann machines for voice conversion. Proc. Interspeech 2013, 3052-3056, doi: 10.21437/Interspeech.2013-666

@inproceedings{chen13k_interspeech,
  author={Ling-Hui Chen and Zhen-Hua Ling and Yan Song and Li-Rong Dai},
  title={{Joint spectral distribution modeling using restricted boltzmann machines for voice conversion}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3052--3056},
  doi={10.21437/Interspeech.2013-666}
}