12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

A Study on Speaker Normalized MLP Features in LVCSR

Zoltán Tüske, Christian Plahl, Ralf Schlüter

RWTH Aachen University, Germany

Different normalization methods are applied in recent Large Vocabulary Continuous Speech Recognition Systems (LVCSR) to reduce the influence of speaker variability on the acoustic models. In this paper we investigate the use of Vocal Tract Length Normalization (VTLN) and Speaker Adaptive Training (SAT) in Multi Layer Perceptron (MLP) feature extraction on an English task. We achieve significant improvements by each normalization method and we gain further by stacking the normalizations. Studying features transformed by Constrained Maximum Likelihood Linear Regression (CMLLR) based SAT as possible input for MLP, further experiments show that MLP could not consistently take advantage of SAT as it does in case of VTLN.

Full Paper

Bibliographic reference.  Tüske, Zoltán / Plahl, Christian / Schlüter, Ralf (2011): "A study on speaker normalized MLP features in LVCSR", In INTERSPEECH-2011, 1089-1092.