In this paper, we propose to incorporate the widely used Multiple Layer Perceptron (MLP) features and discriminative training (DT) into our recent data-sampling based ensemble acoustic models to further improve the quality of the individual models as well as the diversity among the models. We also propose applying speaker-model distance based speaker clustering for data sampling to construct ensembles of acoustic models for speaker independent speech recognition. By using these methods on the speaker independent TIMIT phone recognition task, we have obtained a phoneme recognition accuracy of 77.1% on the TIMIT complete test set, an absolute improvement of 5.4% over our conventional HMM baseline system, making this one of the best reported results on the TIMIT continuous phoneme recognition task.
Bibliographic reference. Chen, Xin / Zhao, Yunxin (2010): "Integrating MLP features and discriminative training in data sampling based ensemble acoustic modeling", In INTERSPEECH-2010, 1349-1352.