Effects of Dimensional Input on Paralinguistic Information Perceived from Synthesized Dialogue Speech with Neural Network

Masaki Yokoyama, Tomohiro Nagata, Hiroki Mori


A novel method of controlling paralinguistic information in neural network-based dialogue speech synthesis is proposed. Controlling paralinguistic information was achieved by feeding emotion dimensions in continuous values into the input layer of the neural networks. Compared to the method using the multiple regression HMM, the naturalness of synthesized speech was improved. The controllability of paralinguistic information was evaluated by examining the shift of the distribution of synthesized parameters. A subjective evaluation test revealed that the correlation between given and perceived paralinguistic information was moderate, though less apparent compared to the multiple regression HMM-based method.


 DOI: 10.21437/Interspeech.2018-2042

Cite as: Yokoyama, M., Nagata, T., Mori, H. (2018) Effects of Dimensional Input on Paralinguistic Information Perceived from Synthesized Dialogue Speech with Neural Network. Proc. Interspeech 2018, 3053-3056, DOI: 10.21437/Interspeech.2018-2042.


@inproceedings{Yokoyama2018,
  author={Masaki Yokoyama and Tomohiro Nagata and Hiroki Mori},
  title={Effects of Dimensional Input on Paralinguistic Information Perceived from Synthesized Dialogue Speech with Neural Network},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3053--3056},
  doi={10.21437/Interspeech.2018-2042},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2042}
}