Whisper is used by speakers in certain circumstances to protect personal information. Due to the differences in production mechanisms between neutral and whispered speech, there are considerable differences between the spectral structure of neutral and whispered speech, such as formant shifts and shifts in spectral slope. This study analyzes the dependency of these differences on speakers and phonemes by applying a Vector Taylor Series (VTS) approximation to a model of the transformation of neutral speech into whispered speech, and estimating the parameters of this model using an Expectation Maximization (EM) algorithm. The results from this study shed light on the speaker and phoneme dependency of the shifts of neutral to whisper speech, and suggest that similarly derived model adaptation or compensation schemes for whisper speech/speaker recognition will be highly speaker dependent.
Bibliographic reference. Fan, Xing / Godin, Keith W. / Hansen, John H. L. (2011): "Acoustic analysis of whispered speech for phoneme and speaker dependency", In INTERSPEECH-2011, 181-184.