15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Voice Conversion Using Generative Trained Deep Neural Networks with Multiple Frame Spectral Envelopes

Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai

USTC, China

This paper presents a deep neural network (DNN) based spectral envelope conversion method. A global DNN is employed to model the complex non-linear mapping relationship between the spectral envelopes of source and target speakers. The proposed DNN is generatively trained layer-by-layer by cascade of two restricted Boltzmann machines (RBMs) and a bidirectional associative memory (BAM), which are considered as generative models estimated using the contrastive divergence algorithm. Further, multiple spectral envelopes are adopted instead of dynamic features for better modeling using the DNN. The superiority of the proposed method is validated by the subjective experimental results.

Full Paper

Bibliographic reference.  Chen, Ling-Hui / Ling, Zhen-Hua / Dai, Li-Rong (2014): "Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes", In INTERSPEECH-2014, 2313-2317.