7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper applies the recently proposed Extended Maximum Likelihood Linear Transformation (EMLLT) model in a Speaker Adaptive Training (SAT) context on the Switchboard database. Adaptation is carried out with maximum likelihood estimation of linear transforms for the means, precisions (inverse covariances) and the feature-space under the EMLLT model. This paper shows the first experimental evidence that significant word-error-rate improvements can be achieved with the EMLLT model (in both VTL and VTL+SAT training contexts) over a state-of-the-art diagonal covariance model in a difficult largevocabulary conversational speech recognition task. The improvements were of the order of 1% absolute in multiple scenarios.
Bibliographic reference. Huang, Jing / Goel, Vaibhava / Gopinath, Ramesh / Kingsbury, Brian / Olsen, Peder / Visweswariah, Karthik (2002): "Large vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model", In ICSLP-2002, 2597-2600.