EUROSPEECH 2003 - INTERSPEECH 2003
In the tandem feature extraction scheme a Multi-Layer Perceptron (MLP) with softmax output layer is discriminatively trained to estimate context independent phoneme posterior probabilities on a labeled database. The outputs of the MLP after nonlinear transformation and Principal Component Analysis (PCA) are used as features in a Gaussian Mixture Model (GMM) based recognizer. The baseline tandem system is trained on 56 Context Independent (CI) phoneme targets. In this paper we examine alternatives to CI phoneme targets by grouping phonemes using apriori and and data-derived knowledge. On connected digit recognition task we achieve comparable performance to the baseline system using fewer data-derived classes.
Bibliographic reference. Sivadas, Sunil / Hermansky, Hynek (2003): "In search of target class definition in tandem feature extraction", In EUROSPEECH-2003, 837-840.