INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Sequential Maximum Mutual Information Linear Discriminant Analysis for Speech Recognition

Yuuki Tachioka (1), Shinji Watanabe (2), Jonathan Le Roux (2), John R. Hershey (2)

(1) Mitsubishi Electric, Japan
(2) MERL, USA

Linear discriminant analysis (LDA) is a simple and effective feature transformation technique that aims to improve discriminability by maximizing the ratio of the between-class variance to the within-class variance. However, LDA does not explicitly consider the sequential discriminative criterion which consists in directly reducing the errors of a speech recognizer. This paper proposes a simple extension of LDA that is called sequential LDA (sLDA) based on a sequential discriminative criterion computed from the Gaussian statistics, which are obtained from sequential maximum mutual information (MMI) training. Although the objective function of the proposed LDA can be regarded as a special case of various discriminative feature transformation techniques (for example, f-MPE or the bottom layer of a neural network), the transformation matrix can be obtained as the closed-form solution to a generalized eigenvalue problem, in contrast to the gradient-descent-based optimization methods usually used in these techniques. Experiments on LVCSR (Corpus of Spontaneous Japanese) and noisy speech recognition task (2nd CHiME challenge) show consistent improvements from standard LDA due to the sequential discriminative training. In addition, the proposed method, despite its simple and fast computation, improved the performance in combination with discriminative feature transformation (f-bMMI), perhaps by providing a good initialization to f-bMMI.

Full Paper

Bibliographic reference.  Tachioka, Yuuki / Watanabe, Shinji / Roux, Jonathan Le / Hershey, John R. (2014): "Sequential maximum mutual information linear discriminant analysis for speech recognition", In INTERSPEECH-2014, 2415-2419.