12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Discriminant Sub-Space Projection of Spectro-Temporal Speech Features Based on Maximizing Mutual Information

Martin Heckmann, Claudius Gläser

Honda Research Institute Europe GmbH, Germany

We previously developed noise robust Hierarchical Spectro- Temporal (Hist) speech features. The learning of the features was performed in an unsupervised way with unlabeled speech data. In a final stage we deployed Principal Component Analysis (PCA) to reduce the feature dimensions and to diagonalize them. In this paper we investigate if a discriminant projection can further increase the performance. We maximize the mutual information between the features and the phoneme categories using a procedure known as Maximizing Renyi's Mutual Information (MRMI) and also compare it to Linear Discriminant Analysis (LDA). Based on recognition tests in clean and in noise, i.e. in matching and mismatching conditions, we show that the discriminant projections increases recognition scores compared to PCA in matching conditions. However, this improvement does not transfer to the mismatching, i.e. noisy, conditions. We discuss measures to alleviate this problem. Overall MRMI performs better than LDA.

Full Paper

Bibliographic reference.  Heckmann, Martin / Gläser, Claudius (2011): "Discriminant sub-space projection of spectro-temporal speech features based on maximizing mutual information", In INTERSPEECH-2011, 225-228.