12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Hierarchical Tandem Features for ASR in Mandarin

Joel Pinto, Mathew Magimai-Doss, Hervé Bourlard

Idiap Research Institute, Switzerland

We apply multilayer perceptron (MLP) based hierarchical Tandem features to large vocabulary continuous speech recognition in Mandarin. Hierarchical Tandem features are estimated using a cascade of two MLP classifiers which are trained independently. The first classifier is trained on perceptual linear predictive coefficients with a 90 ms temporal context. The second classifier is trained using the phonetic class conditional probabilities estimated by the first MLP, but with a relatively longer temporal context of about 150 ms. Experiments on the Mandarin DARPA GALE eval06 dataset show significant reduction (7.6% relative) in character error rates by using hierarchical Tandem features over conventional Tandem features.

Full Paper

Bibliographic reference.  Pinto, Joel / Magimai-Doss, Mathew / Bourlard, Hervé (2011): "Hierarchical tandem features for ASR in Mandarin", In INTERSPEECH-2011, 1241-1244.