ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Short- and long-term dynamic features for robust speech recognition

Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura

The short-term temporal information in speech is widely used for automatic speech recognition (ASR) systems in the form of dynamic features. Long-term temporal information has also been focused on recently and is used to complement traditional short-term features (typically from 25 to 100 ms). There are several approaches to represent long-term temporal information in ASR systems. However, those systems use high-dimensional feature spaces to capture the long-term temporal information. This paper describes an attempt to incorporate long-term temporal information into a feature parameter set by combining conventional dynamic features extracted from both short- and long-term cepstrum sequences. The proposed method includes the temporal contexts of phonemes by using long-term features and the spectral variations within phonemes as short-term features. In an experiment on the realistic speech corpus CENSREC-2, the proposed method yielded higher performance than a standard feature parameter set with static mel-frequency cepstral coefficient (MFCCs) and their short-term dynamic features.

doi: 10.21437/Interspeech.2008-450

Cite as: Fukuda, T., Ichikawa, O., Nishimura, M. (2008) Short- and long-term dynamic features for robust speech recognition. Proc. Interspeech 2008, 2262-2265, doi: 10.21437/Interspeech.2008-450

  author={Takashi Fukuda and Osamu Ichikawa and Masafumi Nishimura},
  title={{Short- and long-term dynamic features for robust speech recognition}},
  booktitle={Proc. Interspeech 2008},