Interspeech'2005 - Eurospeech
Spectral envelopes, using (warped or perceptual) linear prediction or minimum variance distortionless response for the underlying linear parametric model, are widely used in speech recognition systems where the frequency resolution, namely the model order (MO), of the spectrum is kept constant. Modeling different types of phonemes such as vowels or fricatives with the same frequency resolution might not lead to the best possible performance. This could be due to the fact that important parts of various phonemes lie in different frequency regions, that the fundamental frequency varies for different speakers or because of a high variance in the signal to noise ratio. To address this problem we propose to vary the MO frame by frame according to a control factor. In our case, the control factor could be either a relation of autocorrelation coefficients or the spectral entropy. Experimental results on the Translanguage English Database show an improvement by 2.4% relative in word error rate compared to the fixed MO and 4.2% relative to the traditional Mel-frequency cepstral coefficients.
Bibliographic reference. Wölfel, Matthias (2005): "Frame based model order selection of spectral envelopes", In INTERSPEECH-2005, 205-208.