14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Joint Spectral Distribution Modeling Using Restricted Boltzmann Machines for Voice Conversion

Ling-Hui Chen, Zhen-Hua Ling, Yan Song, Li-Rong Dai

USTC, China

This paper presents a new spectral modeling and conversion method for voice conversion. In contrast to the conventional Gaussian mixture model (GMM) based methods, we use restricted Boltzmann machines (RBMs) as probability density models to model the joint distributions of source and target spectral features. The Gaussian distribution in each mixture of GMM is replaced by an RBM, which can better capture the inter-dimensional and interspeaker correlations within the joint spectral features. Spectral conversion is performed by the maximum conditional output probability criterion. Our experimental results show that the similarity and naturalness of the proposed method are significantly improved comparing with the conventional GMM based method.

Full Paper

Bibliographic reference.  Chen, Ling-Hui / Ling, Zhen-Hua / Song, Yan / Dai, Li-Rong (2013): "Joint spectral distribution modeling using restricted boltzmann machines for voice conversion", In INTERSPEECH-2013, 3052-3056.