Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis

Ruibo Fu, Jianhua Tao, Yibin Zheng, Zhengqi Wen


The fundamental frequency and the spectrum parameters of the speech are correlated thus one of their learned mapping from the linguistic features can be leveraged to help determine the other. The conventional methods treated all the acoustic features as one stream for acoustic modeling. And the multi-task learning methods were applied to acoustic modeling with several targets in a global cost function. To improve the accuracy of the acoustic model, the progressive deep neural networks (PDNN) is applied for acoustic modeling in statistical parametric speech synthesis (SPSS) in our method. Each type of the acoustic features is modeled in different sub-networks with its own cost function and the knowledge transfers through lateral connections. Each sub-network in the PDNN can be trained step by step to reach its own optimum. Experiments are conducted to compare the proposed PDNN-based SPSS system with the standard DNN methods. The multi-task learning (MTL) method is also applied to the structure of PDNN and DNN as the contrast experiment of the transfer learning. The computational complexity, prediction sequences and quantity of hierarchies of the PDNN are investigated. Both objective and subjective experimental results demonstrate the effectiveness of the proposed technique.


 DOI: 10.21437/Interspeech.2018-1265

Cite as: Fu, R., Tao, J., Zheng, Y., Wen, Z. (2018) Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis. Proc. Interspeech 2018, 907-911, DOI: 10.21437/Interspeech.2018-1265.


@inproceedings{Fu2018,
  author={Ruibo Fu and Jianhua Tao and Yibin Zheng and Zhengqi Wen},
  title={Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={907--911},
  doi={10.21437/Interspeech.2018-1265},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1265}
}