12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Emotion Classification of Infants' Cries Using Duration Ratios of Acoustic Segments

K. Kitahara, S. Michiwiki, M. Sato, S. Matsunaga, M. Yamashita, K. Shinohara

Nagasaki University, Japan

We propose an approach to the classification of emotion clusters using prosodic features. In our approach, we use the duration ratios of specific acoustic segments . resonant cry segments and silence segments . in the infants' cries as prosodic features. We use power and pitch information to detect these segment periods and use normal distribution as a prosodic model to approximate the occurrence probability of the duration ratios of these segments. Classification experiments on two major emotion clusters are carried out using samples of recorded cries of 11 infants. When the detection performance for the segment periods is about 75%, an emotion classification rate of 70.8% is achieved. The classification performance of our approach using the segment duration ratios was significantly better than that of the classification method using power and spectral features, thereby indicating the effectiveness of using prosodic features. Furthermore, we describe a classification method using both spectral and prosodic features with a slightly better performance (71.9%).

Bibliographic reference.  Kitahara, K. / Michiwiki, S. / Sato, M. / Matsunaga, S. / Yamashita, M. / Shinohara, K. (2011): "Emotion classification of infants' cries using duration ratios of acoustic segments", In INTERSPEECH-2011, 1573-1576.