Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition

Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee


Pain is an unpleasant internal sensation caused by bodily damages or physical illnesses with varied expressions conditioned on personal attributes. In this work, we propose an age-gender embedded latent acoustic representation learned using conditional maximum mean discrepancy variational autoencoder (MMD-CVAE). The learned MMD-CVAE embeds personal attributes information directly in the latent space. Our method achieves a 70.7% in extreme set classification (severe versus mild) and 47.7% in three-class recognition (severe, moderate and mild) by using these MMD-CVAE encoded features on a large-scale real patients pain database. Our method improves a relative of 11.34% and 17.51% compared to using acoustic representation without age-gender conditioning in the extreme set and the three-class recognition respectively. Further analyses reveal under severe pain, females have higher maximum of jitter and lower harmonic energy ratio between F0, H1 and H2 compared to males and the minimum value of jitter and shimmer are higher in the elderly compared to the non-elder group.


 DOI: 10.21437/Interspeech.2018-1298

Cite as: Li, J., Weng, Y., Ng, C., Lee, C. (2018) Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition. Proc. Interspeech 2018, 3438-3442, DOI: 10.21437/Interspeech.2018-1298.


@inproceedings{Li2018,
  author={Jeng-Lin Li and Yi-Ming Weng and Chip-Jin Ng and Chi-Chun Lee},
  title={Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3438--3442},
  doi={10.21437/Interspeech.2018-1298},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1298}
}