ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

A study on temporal features derived by analytic signal

Yotaro Kubo, Shigeki Okawa, Akira Kurematsu, Katsuhiko Shirai

Traditional feature extraction methods for automatic speech recognition (ASR), such as MFCC (Mel-frequency cepstral coefficients) and PLP (perceptual linear prediction) [6], are extracted from short-term spectral envelopes and can be used to realize promising ASR systems. On the other hand, features extracted by TRAPs-like classifiers [2] are based on long-term envelopes of narrow-band signals. These two forms of feature extractions use a mutual representation of energy in narrow band signals.

We have developed a feature extraction system that depends on not only the energy but also the modulation of carrier signals. Carrier signals involve attributes such as the spectral centroid, spectral gradient, number of zero-crossing points, and frequency modulation. Some experiments show that not only the spectral envelope and its modulation but also the zero-crossing points and frequency modulation form a significant portion of human speech perception [4].

In this study, we propose a method of carrier analysis, evaluate this method, and discuss the effectiveness of carrier analysis for ASR. Our method can reduce the phoneme error rate from 45.7% to 38.6%.

doi: 10.21437/Interspeech.2007-369

Cite as: Kubo, Y., Okawa, S., Kurematsu, A., Shirai, K. (2007) A study on temporal features derived by analytic signal. Proc. Interspeech 2007, 1130-1133, doi: 10.21437/Interspeech.2007-369

  author={Yotaro Kubo and Shigeki Okawa and Akira Kurematsu and Katsuhiko Shirai},
  title={{A study on temporal features derived by analytic signal}},
  booktitle={Proc. Interspeech 2007},