This paper presents a new spectral modeling and conversion method for voice conversion. In contrast to the conventional Gaussian mixture model (GMM) based methods, we use restricted Boltzmann machines (RBMs) as probability density models to model the joint distributions of source and target spectral features. The Gaussian distribution in each mixture of GMM is replaced by an RBM, which can better capture the inter-dimensional and interspeaker correlations within the joint spectral features. Spectral conversion is performed by the maximum conditional output probability criterion. Our experimental results show that the similarity and naturalness of the proposed method are significantly improved comparing with the conventional GMM based method.
Bibliographic reference. Chen, Ling-Hui / Ling, Zhen-Hua / Song, Yan / Dai, Li-Rong (2013): "Joint spectral distribution modeling using restricted boltzmann machines for voice conversion", In INTERSPEECH-2013, 3052-3056.