7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

An Evaluation of Using Mutual Information for Selection of Acoustic-Features Representation of Phonemes for Speech Recognition

Mohamed Kamal Omar (1), Ken Chen (1), Mark Hasegawa-Johnson (1), Yigal Brandman (2)

(1) University of Illinois at Urbana-Champaign, USA; (2) Phonetact Inc., USA

This paper addresses the problem of finding a subset of the acoustic feature space that best represents the phoneme set used in a speech recognition system. A maximum mutual information approach is presented for selecting acoustic features to be combined together to represent the distinctions among the phonemes. The overall phoneme recognition accuracy is slightly increased for the same length of feature vector for clean speech and at 10 dB compared to FFT-based Mel-frequency cepstrum coefficients (MFCC) by using acoustic features selected based on a maximum mutual information criterion.

Using 16 different feature sets, the rank of the feature sets based on mutual information can predict phoneme recognition accuracy with a correlation coefficient of 0.71 compared to a correlation coef- ficient of 0.28 when using a criterion based on the average pair-wise Kullback-Liebler divergence to rank the feature sets.


Full Paper

Bibliographic reference.  Omar, Mohamed Kamal / Chen, Ken / Hasegawa-Johnson, Mark / Brandman, Yigal (2002): "An evaluation of using mutual information for selection of acoustic-features representation of phonemes for speech recognition", In ICSLP-2002, 2129-2132.