14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Ensemble of Machine Learning and Acoustic Segment Model Techniques for Speech Emotion and Autism Spectrum Disorders Recognition

Hung-yi Lee (1), Ting-yao Hu (2), How Jing (1), Yun-Fan Chang (1), Yu Tsao (1), Yu-Cheng Kao (3), Tsang-Long Pao (3)

(1) Academia Sinica, Taiwan
(2) National Taiwan University, Taiwan
(3) Tatung University, Taiwan

This study investigates the classification performances of emotion and autism spectrum disorders from speech utterances using ensemble classification techniques. We first explore the performances of three well-known machine learning techniques, namely, support vector machines (SVM), deep neural networks (DNN) and k-nearest neighbours (KNN), with acoustic features extracted by the openSMILE feature extractor. In addition, we propose an acoustic segment model (ASM) technique, which incorporates the temporal information of speech signals to perform classification. A set of ASMs is automatically learned for each category of emotion and autism spectrum disorders, and then the ASM sets decode an input utterance into series of acoustic patterns, with which the system determines the category for that utterance. Our ensemble system is a combination of the machine learning and ASM techniques. The evaluations are conducted using the data sets provided by the organizer of the INTERSPEECH 2013 Computational Paralinguistics Challenge.

Full Paper

Bibliographic reference.  Lee, Hung-yi / Hu, Ting-yao / Jing, How / Chang, Yun-Fan / Tsao, Yu / Kao, Yu-Cheng / Pao, Tsang-Long (2013): "Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition", In INTERSPEECH-2013, 215-219.