EUROSPEECH 2003 - INTERSPEECH 2003
A common practice in ASR to add contextual information is to append consecutive feature frames in a single large feature vector. However, this increases the processing time in the acoustic modelling and may lead to poorly trained parameters. A possible solution is to use a Linear Discriminant Analysis (LDA) mapping to reduce the dimensionality of the feature, but this is not optimal, at least in the case where the LDA classes are HMM-states. It is shown in this paper that the feature reduction problem is essentially a problem of approximating class posterior probabilities. These can be approximated using Neural Nets (NN). Some approaches using different choices for the classes and NN topology are presented and tested on the AURORA 2000 digit task and on our in-car task. Results on AURORA show a significant performance increase compared to LDA, but none of the NN-based approaches outperforms LDA on our in-car task.
Bibliographic reference. Hilario, Joan Mari / Class, Fritz (2003): "A comparative study of some discriminative feature reduction algorithms on the AURORA 2000 and the daimlerchrysler in-car ASR tasks", In EUROSPEECH-2003, 3101-3104.