12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Log-Linear Optimization of Second-Order Polynomial Features with Subsequent Dimension Reduction for Speech Recognition

Muhammad Ali Tahir, Ralf Schlüter, Hermann Ney

RWTH Aachen University, Germany

Second order polynomial features are useful for speech recognition because they can be used to model class specific covariance even with a pooled covariance acoustic model. Previous experiments with second order features have shown word error rate improvements. However, the improvement comes at the price of a large increase in the number of parameters. This paper investigates the discriminative training of second order features, with a subsequent dimension reduction transform to limit the increase in number of parameters. The acoustic model parameters and the transformation matrix parameters are modeled log-linearly and optimized using maximum mutual information criterion. The advantage of log-linear optimization lies in its ability to robustly combine different kinds of features. Experiments are performed for second order MFCC features on the EPPS large vocabulary task and have resulted in a decrease in word error rate.

Full Paper

Bibliographic reference.  Tahir, Muhammad Ali / Schlüter, Ralf / Ney, Hermann (2011): "Log-linear optimization of second-order polynomial features with subsequent dimension reduction for speech recognition", In INTERSPEECH-2011, 1705-1708.