INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Analysis and Classification of Speech Mode: Whispered Through Shouted

Chi Zhang, John H. L. Hansen

University of Texas at Dallas, USA

Variation in vocal effort represents one of the most challenging problems in maintaining speech system performance for coding, speech and speaker recognition. Changes in vocal effort (or mode) result in a fundamental change in speech production which is not simply a change in volume. This is the first study to collectively consider the five speech modes: whispered, soft, neutral, loud and shouted. After corpus development, analysis is performed for i) sound intensity level, ii) duration and silence percentage, iii) frame energy distribution and iv) spectral tilt. The analysis shows vocal effort dependent traits which are used to investigate speaker recognition. Matched vocal mode conditions result in a closed-set speaker ID rate of 97.62%, with mismatch vocal conditions producing 54.02%. Finally, a speech mode classification system is developed, which has a range of classification rate from 44.5% to 98.5% confusing with adjacent vocal modes. These advancements can provide improved speech/speaker modeling information, as well as classified vocal mode knowledge to improve speech and language technology in real scenarios.

Full Paper

Bibliographic reference.  Zhang, Chi / Hansen, John H. L. (2007): "Analysis and classification of speech mode: whispered through shouted", In INTERSPEECH-2007, 2289-2292.