Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Acoustic Modelling Using Modular/Ensemble Combinations of Heterogeneous Neural Networks

Christos A. Antoniou, T. Jeff Reynolds

Department of Computer Science, University of Essex, Colchester, UK

We have been investigating for some time the use of modular/ensemble neural networks to model phones, a commonly chosen acoustic unit for speech. We have demonstrated the advantage of using separately trained MLPs to estimate each phone's probability, posterior on a sequence of feature vectors representing the expression of the phone over some window in time. In this paper we show how MLPs trained on different feature vectors, derived from different pre-processing techniques, may be combined to produce better estimates of phone posteriors and hence lower word error rates. We also show how calculated broad-class posterior probabilities may be used to provide contextual information to train further nets. The combination of these techniques results in significant improvements for phone classification and word error rates on the TIMIT corpus.

Full Paper

Bibliographic reference.  Antoniou, Christos A. / Reynolds, T. Jeff (2000): "Acoustic modelling using modular/ensemble combinations of heterogeneous neural networks", In ICSLP-2000, vol.1, 282-285.