A discriminatively derived linear transform is described, which, when applied to a given set of acoustic parameters, is capable of improving the accuracy of a speech recognition system based on hidden Markov modelling. This work builds upon a linear transform known as IMELDA, which itself is known to provide an effective and computationally efficient spectral representation for speech recognition, especially when the speech has been degraded. IMELDA uses linear discriminant analysis to maximise a measure of separability of the states of a given set ofHMMs. The new transform is derived using a steepest descent method to minimise a measure of whole-word error rate. A weighting function is used that allows the transform to concentrate on those training tokens which yield borderline recognition decisions. Error rates for a speaker-independent E set recognition task are reduced by a factor of 2, resulting in an error rate of 4.0% on the BTL alphabet database, believed to be the best results published to date using this database. Like IMELDA, the new technique applies a single transformation to all input utterance frames, and is therefore computationally efficient.
Keywords: speech recognition, discriminative training, hidden Markov modelling, HMM, linear discriminant analysis, IMELDA.
Bibliographic reference. Ayer, C. M. / Hunt, Melvyn J. / Brookes, D. M. (1993): "A discriminatively derived linear transform for improved speech recognition", In EUROSPEECH'93, 583-586.