We performed a behavioral experiment to demonstrate the effect of spectral slope on the perception of speaker size, and we developed an auditory model based on the dynamic compressive gammachirp filterbank (dcGC-FB) to explain the results. STRAIGHT was used to generate “unvoiced” and “whispered” versions of naturally recorded words; the only difference was that the spectral slope of the whispered words was tilted up 6 dB/octave with respect to that of the unvoiced words. The experiment confirmed that the whispered words are heard to come from smaller speakers. The auditory model uses the tonotopic excitation pattern, Ep, as the internal representation of speech sounds. The model is found to be much more effective when the gradient of the excitation pattern, ▽ Ep, is included in the size discrimination process. It is particularly useful for explaining individual subject variability.
Bibliographic reference. Yamamoto, Kodai / Irino, Toshio / Nisimura, Ryuichi / Kawahara, Hideki / Patterson, Roy D. (2015): "How the slope of the speech spectrum affects the perception of speaker size", In INTERSPEECH-2015, 1556-1560.