INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Cries and Whispers.Classification of Vocal Effort in Expressive Speech

Nicolas Obin

IRCAM-CNRS UMR 9912-STMS, Paris, France

The expansion of the video games industry raises innovative and challenging issues for speech technologies, e.g. the development of automatic content-based speech processing and speech recognition systems in the context of video games post-production. This paper presents a large-scale study on the classification of vocal effort in expressive speech for video games. Changes in vocal effort conduct to substantial modifications in the configuration of voice production mechanisms. In particular, registers of vocal effort affect especially voice quality which reflects qualitative modifications of the source excitation characteristics. This study introduces robust source characteristics to measure various types of voice quality (e.g., breathy, creaky, tense) for the classification of vocal effort into whispered, normal, and shouted speech. The system is evaluated in the real scenario of video games production with the complete speech recordings of a massive role-playing video game. The proposed features significantly improve the classification from 81.1% to 87% over conventional MFCCs. These advancements confirm the role of the source and voice quality for the description of changes in vocal effort.

Index Terms: speech recognition, vocal effort, voice quality, glottal source, GMM-UBM/SVM

Full Paper

Bibliographic reference.  Obin, Nicolas (2012): "Cries and whispers.classification of vocal effort in expressive speech", In INTERSPEECH-2012, 2234-2237.