7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Modeling the Perception of Frequency-Shifted Vowels

Peter F. Assmann (1), Terrance M. Nearey (2), Jack M. Scott (1)

(1) University of Texas at Dallas, USA; (2) University of Alberta, Canada

A significant fact about speech perception is that intelligibility is preserved when the spectrum is shifted up or down along the frequency scale, across a fairly wide range. To study the relationship between fundamental frequency (F0) and spectrum envelope shifts in vowel perception, we used a high-quality vocoder (STRAIGHT) to process a set of vowels spoken by 3 adult males in /hVd/ context. Identification accuracy dropped by about 30% when the spectrum envelope was scaled upwards by a factor of 2.0, and in a separate condition, by about 50% when F0 was raised by 2 octaves. However, when spectrum envelope and F0 were both increased at the same time, identification accuracy showed a marked improvement, compared to conditions where each cue was manipulated separately. The synergy between formant frequency and F0 was predicted by a model which accounts for the intelligibility of frequency-shifted vowels in terms of learned relationships between measured values of F0 and formant frequencies. A second model, based on auditory excitation patterns, predicted the main effects of F0 and spectrum envelope, but did not predict the pattern of interaction.

Full Paper

Bibliographic reference.  Assmann, Peter F. / Nearey, Terrance M. / Scott, Jack M. (2002): "Modeling the perception of frequency-shifted vowels", In ICSLP-2002, 425-428.