EUROSPEECH 2003 - INTERSPEECH 2003
This paper applies the recently proposed SPAM models for acoustic modeling in a Speaker Adaptive Training (SAT) context on large vocabulary conversational speech databases, including the Switchboard database. SPAM models are Gaussian mixture models in which a subspace constraint is placed on the precision and mean matrices (although this paper focuses on the case of unconstrained means). They include diagonal covariance, full covariance, MLLT, and EMLLT models as special cases. Adaptation is carried out with maximum likelihood estimation of the means and feature-space under the SPAM model. This paper shows the first experimental evidence that the SPAM models can achieve significant word-error-rate improvements over state-of-the-art diagonal covariance models, even when those diagonal models are given the benefit of choosing the optimal number of Gaussians (according to the Bayesian Information Criterion). This paper also is the first to apply SPAM models in a SAT context. All experiments are performed on the IBM "Superhuman" speech corpus which is a challenging and diverse conversational speech test set that includes the Switchboard portion of the 1998 Hub5e evaluation data set.
Bibliographic reference. Axelrod, Scott / Goel, Vaibhava / Kingsbury, Brian / Visweswariah, Karthik / Gopinath, Ramesh (2003): "Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices", In EUROSPEECH-2003, 1613-1616.