7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Large Vocabulary Conversational Speech Recognition with the Extended Maximum Likelihood Linear Transformation (EMLLT) Model

Jing Huang, Vaibhava Goel, Ramesh Gopinath, Brian Kingsbury, Peder Olsen, Karthik Visweswariah

IBM T. J. Watson Research Center, USA

This paper applies the recently proposed Extended Maximum Likelihood Linear Transformation (EMLLT) model in a Speaker Adaptive Training (SAT) context on the Switchboard database. Adaptation is carried out with maximum likelihood estimation of linear transforms for the means, precisions (inverse covariances) and the feature-space under the EMLLT model. This paper shows the first experimental evidence that significant word-error-rate improvements can be achieved with the EMLLT model (in both VTL and VTL+SAT training contexts) over a state-of-the-art diagonal covariance model in a difficult largevocabulary conversational speech recognition task. The improvements were of the order of 1% absolute in multiple scenarios.

Full Paper

Bibliographic reference.  Huang, Jing / Goel, Vaibhava / Gopinath, Ramesh / Kingsbury, Brian / Olsen, Peder / Visweswariah, Karthik (2002): "Large vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model", In ICSLP-2002, 2597-2600.