ISCA Archive SCST 1990
ISCA Archive SCST 1990

Formant normalisation for speech recognition and vowel studies

James L. Hieronymus

Good vowel recognition and studies of vowels from different talkers requires an accurate method for compensating for speaker differences in formant target frequencies. The major variance seen in the data is between males and females. However, even within the same sex class, there are large variations in the formant target frequencies for the same vowel in the same phonetic context. Various methods of compensating for speaker variation in formants were studied. Bark scaled formants and subtraction of Bark fundamental frequency from the first formant was tried first. In spite of recent published papers on the efficacy of this technique, it was found inadequate. The transformations were incapable of improving the clusters of the cardinal vowels for example. A modification of the Gerstman technique, determining the speaker's formant range and then transforming into an "ideal" talker's range was found to account for most of the variance due to different talkers. This technique was applied to vowel in context studies on American English. Formant ranges were studied for 125 talkers of General American English. Plots of formant ranges for males and females showed interesting patterns. The lower limit of the second formant was not very different, while the lower limit of the first formant was lower for males. Both the first and second formant maxima were larger for females. The modified Gerstman transformation was able to superimpose the formant targets for the same vowel in the same context from different talkers into the same region of Fl, F2 space. There remained some residual variance between male and female, even after the transformation. These trends are shown in a series of plots of vowel target frequency data.


Cite as: Hieronymus, J.L. (1990) Formant normalisation for speech recognition and vowel studies. Proc. ESCA Workshop on Speaker Characterization in Speech Technology, 127-130

@inproceedings{hieronymus90_scst,
  author={James L. Hieronymus},
  title={{Formant normalisation for speech recognition and vowel studies}},
  year=1990,
  booktitle={Proc. ESCA Workshop on Speaker Characterization in Speech Technology},
  pages={127--130}
}