We propose a method to discriminatively train acoustic models with sparse inverse covariance (precision) matrices in order to improve the model robustness when training data is insufficient. Acoustic models with sparse inverse covariance matrices were previously proposed to address the problem of over-fitting when training data is inadequate. Since many of the entries of the inverse covariance matrices are driven to zero, the number of free parameters to be estimated is reduced. However, previously acoustic models using sparse inverse covariance matrices were trained using maximum likelihood (ML) training. It is well-known that discriminative training can further improve the recognition accuracy. Therefore, for the first time, we study the problem of training acoustic models with sparse inverse covariance matrices using the discriminative training method. An L1 regularization term is added to the traditional objective function for discriminative training to penalize complex models and to automatically sparsify the inverse covariance matrices. The new objective function is optimized by maximizing a weak-sense auxiliary function. Experimental results on the Wall Street Journal data set show that our method effectively regularizes the model complexity and allows more Gaussian components to be trained. Therefore it can better model the non-Gaussian nature of the speech feature vectors. Compared with the standard maximum mutual information (MMI) training method, our proposed method can significantly improve the recognition accuracy.
Bibliographic reference. Zhang, Weibin / Fung, Pascale (2013): "Discriminatively trained sparse inverse covariance matrices for low resource acoustic modeling", In INTERSPEECH-2013, 2350-2354.