9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Hilbert Envelope Based Spectro-Temporal Features for Phoneme Recognition in Telephone Speech

Samuel Thomas, Sriram Ganapathy, Hynek Hermansky

IDIAP Research Institute, Switzerland

In this paper, we present a spectro-temporal feature extraction technique using sub-band Hilbert envelopes of relatively long segments of speech signal. Hilbert envelopes of the subbands are estimated using Frequency Domain Linear Prediction (FDLP). Spectral features are derived by integrating the subband Hilbert envelopes in short-term frames and the temporal features are formed by converting the FDLP envelopes into modulation frequency components. These are then combined at the phoneme posterior level and are used as the input features for a phoneme recognition system. In order to improve the robustness of the proposed features to telephone speech, the sub-band temporal envelopes are gain normalized prior to feature extraction. Phoneme recognition experiments on telephone speech in the HTIMIT database show significant performance improvements for the proposed features when compared to other robust feature techniques (average relative reduction of 11% in phoneme error rate).

Full Paper

Bibliographic reference.  Thomas, Samuel / Ganapathy, Sriram / Hermansky, Hynek (2008): "Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech", In INTERSPEECH-2008, 1521-1524.