EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Acoustic Modeling with Mixtures of Subspace Constrained Exponential Models

Karthik Visweswariah, Scott Axelrod, Ramesh Gopinath

IBM T.J. Watson Research Center, USA

Gaussian distributions are usually parameterized with their natural parameters: the mean (mu) and the covariance (Sigma). They can also be re-parameterized as exponential models with canonical parameters P = (Sigma) ^-1 and (psi) = P(mu). In this paper we consider modeling acoustics with mixtures of Gaussians parameterized with canonical parameters where the parameters are constrained to lie in a shared affine subspace. This class of models includes Gaussian models with various constraints on its parameters: diagonal covariances, MLLT models, and the recently proposed EMLLT and SPAM models. We describe how to perform maximum likelihood estimation of the subspace and parameters within a fixed subspace. In speech recognition experiments, we show that this model improves upon all of the above classes of models with roughly the same number of parameters and with little computational overhead. In particular we get 30-40% relative improvement over LDA+MLLT models when using roughly the same number of parameters.

Full Paper

Bibliographic reference.  Visweswariah, Karthik / Axelrod, Scott / Gopinath, Ramesh (2003): "Acoustic modeling with mixtures of subspace constrained exponential models", In EUROSPEECH-2003, 2613-2616.