INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

A Study on Temporal Features Derived by Analytic Signal

Yotaro Kubo (1), Shigeki Okawa (2), Akira Kurematsu (1), Katsuhiko Shirai (1)

(1) Waseda University, Japan
(2) Chiba Institute of Technology, Japan

Traditional feature extraction methods for automatic speech recognition (ASR), such as MFCC (Mel-frequency cepstral coefficients) and PLP (perceptual linear prediction) [6], are extracted from short-term spectral envelopes and can be used to realize promising ASR systems. On the other hand, features extracted by TRAPs-like classifiers [2] are based on long-term envelopes of narrow-band signals. These two forms of feature extractions use a mutual representation of energy in narrow band signals.

We have developed a feature extraction system that depends on not only the energy but also the modulation of carrier signals. Carrier signals involve attributes such as the spectral centroid, spectral gradient, number of zero-crossing points, and frequency modulation. Some experiments show that not only the spectral envelope and its modulation but also the zero-crossing points and frequency modulation form a significant portion of human speech perception [4].

In this study, we propose a method of carrier analysis, evaluate this method, and discuss the effectiveness of carrier analysis for ASR. Our method can reduce the phoneme error rate from 45.7% to 38.6%.

Full Paper

Bibliographic reference.  Kubo, Yotaro / Okawa, Shigeki / Kurematsu, Akira / Shirai, Katsuhiko (2007): "A study on temporal features derived by analytic signal", In INTERSPEECH-2007, 1130-1133.