September 22-25, 1997
It has recently been shown that normalisation of vocal tract length can significantly increase recognition accuracy in speaker independent automatic speech recognition systems. An inherent difficulty with this technique is in automatically estimating the normalisation parameter from a new speaker's speech and previous techniques have typically relied on an exhaustive search to estimate this parameter. In this paper, we present a method of normalising utterances by a linear warping of mel filter bank channels in which the normalisation parameter is estimated by fitting formant estimates to a probabilistic model. This method is fast, computationally inexpensive and requires only a limited amount of data for estimation. It generates normalisations which are close to those which would be found by an exhaustive search. The normalisation is applied to a phoneme recognition task using the TIMIT database and results show a useful improvement over an unnormalised speaker independent system.
Bibliographic reference. Lincoln, Mike / Cox, Stephen / Ringland, Simon (1997): "A fast method of speaker normalisation using formant estimation", In EUROSPEECH-1997, 2095-2098.