ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Tandem representations of spectral envelope and modulation frequency features for ASR

Samuel Thomas, Sriram Ganapathy, Hynek Hermansky

We present a feature extraction technique for automatic speech recognition that uses Tandem representation of short-term spectral envelope and modulation frequency features. These features, derived from sub-band temporal envelopes of speech estimated using frequency domain linear prediction, are combined at the phoneme posterior level. Tandem representations derived from these phoneme posteriors are used along with HMM-based ASR systems for both small and large vocabulary continuous speech recognition (LVCSR) tasks. For a small vocabulary continuous digit task on the OGI Digits database, the proposed features reduce the word error rate (WER) by 13% relative to other feature extraction techniques. We obtain a relative reduction of about 14% in WER for an LVCSR task using the NIST RT05 evaluation data. For phoneme recognition tasks on the TIMIT database these features provide a relative improvement of 13% compared to other techniques.


doi: 10.21437/Interspeech.2009-748

Cite as: Thomas, S., Ganapathy, S., Hermansky, H. (2009) Tandem representations of spectral envelope and modulation frequency features for ASR. Proc. Interspeech 2009, 2955-2958, doi: 10.21437/Interspeech.2009-748

@inproceedings{thomas09_interspeech,
  author={Samuel Thomas and Sriram Ganapathy and Hynek Hermansky},
  title={{Tandem representations of spectral envelope and modulation frequency features for ASR}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2955--2958},
  doi={10.21437/Interspeech.2009-748}
}