INTERSPEECH 2010
11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Improved Phoneme Recognition by Integrating Evidence from Spectro-Temporal and Cepstral Features

Shang-wen Li, Liang-che Sun, Lin-shan Lee

National Taiwan University, Taiwan

Gabor features have been proposed for extracting spec-tro-temporal modulation information, and yielding significant improvements in recognition performance. In this paper, we propose the integration of Gabor posteriors with MFCC post-eriors, yielding a relative improvement of 14.3% over an MFCC Tandem system. We analyze for different types of acoustic units the complementarity between Gabor features with long-term spectro-temporal modulation information in the mel-spectrogram and MFCC features with short-term temporal information in the cepstral domain. It is found that Gabor features are better for vowel recognition while MFCCs are better for consonants. This explains why their integration offers improvements.

Full Paper

Bibliographic reference.  Li, Shang-wen / Sun, Liang-che / Lee, Lin-shan (2010): "Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features", In INTERSPEECH-2010, 1177-1180.