12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Acoustic Analysis of Whispered Speech for Phoneme and Speaker Dependency

Xing Fan, Keith W. Godin, John H. L. Hansen

University of Texas at Dallas, USA

Whisper is used by speakers in certain circumstances to protect personal information. Due to the differences in production mechanisms between neutral and whispered speech, there are considerable differences between the spectral structure of neutral and whispered speech, such as formant shifts and shifts in spectral slope. This study analyzes the dependency of these differences on speakers and phonemes by applying a Vector Taylor Series (VTS) approximation to a model of the transformation of neutral speech into whispered speech, and estimating the parameters of this model using an Expectation Maximization (EM) algorithm. The results from this study shed light on the speaker and phoneme dependency of the shifts of neutral to whisper speech, and suggest that similarly derived model adaptation or compensation schemes for whisper speech/speaker recognition will be highly speaker dependent.

Full Paper

Bibliographic reference.  Fan, Xing / Godin, Keith W. / Hansen, John H. L. (2011): "Acoustic analysis of whispered speech for phoneme and speaker dependency", In INTERSPEECH-2011, 181-184.