EUROSPEECH 2003 - INTERSPEECH 2003
In this work, linear and nonlinear feature transformations have been experimented in ASR front end. Unsupervised transformations were based on principal component analysis and independent component analysis. Discriminative transformations were based on linear discriminant analysis and multilayer perceptron networks. The acoustic models were trained using a subset of HUB5 training data and they were tested using OGI Numbers corpus. Baseline feature vector consisted of PLP cepstrum and energy with first and second order deltas. None of the feature transformations could outperform the baseline when used alone, but improvement in the word error rate was gained when the baseline feature was combined with the feature transformation stream. Two combination methods were experimented: feature vector concatenation and n-best list combination using ROVER. Best results were obtained using the combination of the baseline PLP cepstrum and the feature transform based on multilayer perceptron network. The word error rate in the number recognition task was reduced from 4.1 to 3.1.
Bibliographic reference. Somervuo, Panu / Chen, Barry / Zhu, Qifeng (2003): "Feature transformations and combinations for improving ASR performance", In EUROSPEECH-2003, 477-480.