7th International Conference on Spoken Language Processing
September 16-20, 2002
A significant fact about speech perception is that intelligibility is preserved when the spectrum is shifted up or down along the frequency scale, across a fairly wide range. To study the relationship between fundamental frequency (F0) and spectrum envelope shifts in vowel perception, we used a high-quality vocoder (STRAIGHT) to process a set of vowels spoken by 3 adult males in /hVd/ context. Identification accuracy dropped by about 30% when the spectrum envelope was scaled upwards by a factor of 2.0, and in a separate condition, by about 50% when F0 was raised by 2 octaves. However, when spectrum envelope and F0 were both increased at the same time, identification accuracy showed a marked improvement, compared to conditions where each cue was manipulated separately. The synergy between formant frequency and F0 was predicted by a model which accounts for the intelligibility of frequency-shifted vowels in terms of learned relationships between measured values of F0 and formant frequencies. A second model, based on auditory excitation patterns, predicted the main effects of F0 and spectrum envelope, but did not predict the pattern of interaction.
Bibliographic reference. Assmann, Peter F. / Nearey, Terrance M. / Scott, Jack M. (2002): "Modeling the perception of frequency-shifted vowels", In ICSLP-2002, 425-428.