An Open Source Emotional Speech Corpus for Human Robot Interaction Applications

Jesin James, Li Tian, Catherine Inez Watson


For further understanding the wide array of emotions embedded in human speech, we are introducing a strictly-guided simulated emotional speech corpus. In contrast to existing speech corpora, this was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions and 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI) to model natural conversations among humans and robots. But there are few existing speech resources to study these emotions, which has motivated the creation of this corpus. A large scale perception test with 120 participants showed that the corpus has approximately 70% and 40% accuracy in the correct classification of primary and secondary emotions respectively. The reasons behind the differences in perception accuracies of the two emotion types is further investigated. A preliminary prosodic analysis of corpus shows significant differences among the emotions. The corpus is made public at: github.com/tli725/JL-Corpus.


 DOI: 10.21437/Interspeech.2018-1349

Cite as: James, J., Tian, L., Inez Watson, C. (2018) An Open Source Emotional Speech Corpus for Human Robot Interaction Applications. Proc. Interspeech 2018, 2768-2772, DOI: 10.21437/Interspeech.2018-1349.


@inproceedings{James2018,
  author={Jesin James and Li Tian and Catherine {Inez Watson}},
  title={An Open Source Emotional Speech Corpus for Human Robot Interaction Applications},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2768--2772},
  doi={10.21437/Interspeech.2018-1349},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1349}
}