Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions

Yijia Xu, Mark Hasegawa-Johnson, Nancy McElwain


Detection of infant emotional outbursts, such as crying, in large corpora of recorded infant speech, is essential to the study of dyadic social process, by which infants learn to identify and regulate their own emotions. Such large corpora now exist with the advent of LENA speech monitoring systems, but are not labeled for emotional outbursts. This paper reports on our efforts to manually code child utterances as being of type "laugh", "cry", "fuss", "babble" and "hiccup" and to develop algorithms capable of performing the same task automatically. Human labelers achieve much higher rates of inter-coder agreement for some of these categories than for others. Linear discriminant analysis (LDA) achieves better accuracy on tokens that have been coded by two human labelers than on tokens that have been coded by only one labeler, but the difference is not as much as we expected, suggesting that the acoustic and contextual features being used by human labelers are not yet available to the LDA. Convolutional neural network and hidden markov model achieve better accuracy than LDA, but worse F-score, because they over-weight the prior. Discounting the transition probability does not solve the problem.


 DOI: 10.21437/Interspeech.2018-2429

Cite as: Xu, Y., Hasegawa-Johnson, M., McElwain, N. (2018) Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions. Proc. Interspeech 2018, 242-246, DOI: 10.21437/Interspeech.2018-2429.


@inproceedings{Xu2018,
  author={Yijia Xu and Mark Hasegawa-Johnson and Nancy McElwain},
  title={Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={242--246},
  doi={10.21437/Interspeech.2018-2429},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2429}
}