Interspeech'2005 - Eurospeech
This paper describes the application of Mixtures of Probabilistic Principal Component Analyzers (MPPCA) for modeling the observation distributions in a speech recognition system. The MPPCA model is a mixture of Gaussians with a constrained covariance approximating a full covariance with less effective parameters whose complexity can be controlled by the user. The paper summarizes the necessary basics of the MPPCA model, describes a simple extension of the basic model to set the user-defined complexity of the constrained covariance in a more automatic way and describes how to deal with numerical problems occurring for typical speech recognition systems. The MPPCA model is tested against a diagonal covariance and a full covariance model for our so far best acoustic model with 5000 quinphone clustered states and 80/160k Gaussians total on a large, spontaneous Japanese speech task. Results show that we can improve error rates on the standard test set from 22.4% to 19.8% by moving to full covariances. For several MPPCA models tested we reach the same error rates with less effective parameters but fail to significantly improve over using full covariances, for which possible reasons are discussed.
Bibliographic reference. Schuster, Mike / Hori, Takaaki / Nakamura, Atsushi (2005): "Experiments with probabilistic principal component analysis in LVCSR", In INTERSPEECH-2005, 1685-1688.