In a recent paper, we reported promising automatic speech recognition results obtained by appending spectral entropy features to PLP features. In the present paper, spectral entropy features are used along with PLP features in multi-stream framework. In our multi-stream hidden Markov model/artificial neural network system, we train a separate multi-layered perceptron (MLP) for PLP features, spectral entropy features and both the features combined by concatenation. The output posteriors from these three MLPs are combined with weights inversely proportional to the output entropies of the respective MLPs. On the Numbers95 database, this approach yields a considerable improvement both under clean and noisy conditions as compared to simply appending the features. Further, in multi-stream Tandem system, we apply the same inverse entropy weighting to combine the outputs of the MLPs before the softmax non-linearity. Feeding the combined outputs after decorrelation to the standard hidden Markov model/Gaussian mixture model system gives a 9.2% relative error reduction as compared to the baseline.
Cite as: Misra, H., Bourlard, H. (2005) Spectral entropy feature in full-combination multi-stream for robust ASR. Proc. Interspeech 2005, 2633-2636, doi: 10.21437/Interspeech.2005-247
@inproceedings{misra05_interspeech, author={Hemant Misra and Hervé Bourlard}, title={{Spectral entropy feature in full-combination multi-stream for robust ASR}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2633--2636}, doi={10.21437/Interspeech.2005-247} }