In this paper, we present a study to understand the relation among spectra of speakers enunciating the same sound and investigate the issue of uniform versus non-uniform scaling. There is a lot of interest in understanding this relation as speaker variability is a major source of concern in many applications including Automatic Speech Recognition (ASR). Using dynamic programming, we find mapping relations between smoothed spectral envelopes of speakers enunciating the same sound and show that these relations are not linear but have a consistent non-uniform behavior. This non-uniform behavior is also shown to vary across vowels. Through a series of experiments, we show that using the observed non-uniform relation provides better vowel normalization than just a simple linear scaling relation. All results in this paper are based on vowel data from TIMIT, Hillenbrand et al. and North Texas databases.
Bibliographic reference. Harish, A. N. / Sanand, D. R. / Umesh, S. (2009): "Characterizing speaker variability using spectral envelopes of vowel sounds", In INTERSPEECH-2009, 1107-1110.