The Contribution of Acoustic Features Analysis to Model Emotion Perceptual Process for Language Diversity

Xingfeng Li, Masato Akagi


The multi-layered perceptual process of emotion in human speech plays an essential role in the field of affective computing for underlying a speaker’s state. However, a comprehensive process analysis of emotion perception is still challenging due to the lack of powerful acoustic features allowing accurate inference of emotion across speaker and language diversities. Most previous research works study acoustic features mostly using Fourier transform, short time Fourier transform or linear predictive coding. Even though these features may be useful for stationary signal within short frames, they may not capture the localized event adequately as speech transmits emotion information dynamically over time. This case introduces a set of acoustic features via wavelet transform analysis of the speech signal, and specifically, models the perceptual process of emotion for language diversity. For this aim, the proposed features are analyzed in a three-layer emotion perception model across multiple languages. Experiments show that the proposed acoustic features significantly enhance the perceptual process of emotion and render a better result in multilingual emotion recognition when compared it to the widely used prosodic and spectral features, as well as their combination in literature.


 DOI: 10.21437/Interspeech.2019-2229

Cite as: Li, X., Akagi, M. (2019) The Contribution of Acoustic Features Analysis to Model Emotion Perceptual Process for Language Diversity. Proc. Interspeech 2019, 3262-3266, DOI: 10.21437/Interspeech.2019-2229.


@inproceedings{Li2019,
  author={Xingfeng Li and Masato Akagi},
  title={{The Contribution of Acoustic Features Analysis to Model Emotion Perceptual Process for Language Diversity}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3262--3266},
  doi={10.21437/Interspeech.2019-2229},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2229}
}