8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


In Search Of Target Class Definition In Tandem Feature Extraction

Sunil Sivadas, Hynek Hermansky

Oregon Health & Science University, USA

In the tandem feature extraction scheme a Multi-Layer Perceptron (MLP) with softmax output layer is discriminatively trained to estimate context independent phoneme posterior probabilities on a labeled database. The outputs of the MLP after nonlinear transformation and Principal Component Analysis (PCA) are used as features in a Gaussian Mixture Model (GMM) based recognizer. The baseline tandem system is trained on 56 Context Independent (CI) phoneme targets. In this paper we examine alternatives to CI phoneme targets by grouping phonemes using apriori and and data-derived knowledge. On connected digit recognition task we achieve comparable performance to the baseline system using fewer data-derived classes.

Full Paper

Bibliographic reference.  Sivadas, Sunil / Hermansky, Hynek (2003): "In search of target class definition in tandem feature extraction", In EUROSPEECH-2003, 837-840.