This paper presents a deep neural network (DNN) based spectral envelope conversion method. A global DNN is employed to model the complex non-linear mapping relationship between the spectral envelopes of source and target speakers. The proposed DNN is generatively trained layer-by-layer by cascade of two restricted Boltzmann machines (RBMs) and a bidirectional associative memory (BAM), which are considered as generative models estimated using the contrastive divergence algorithm. Further, multiple spectral envelopes are adopted instead of dynamic features for better modeling using the DNN. The superiority of the proposed method is validated by the subjective experimental results.
Bibliographic reference. Chen, Ling-Hui / Ling, Zhen-Hua / Dai, Li-Rong (2014): "Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes", In INTERSPEECH-2014, 2313-2317.