8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Fepstrum: An Improved Modulation Spectrum for ASR

Vivek Tyagi

IBM India Research Lab, India

In our previous work [3, 4], we have introduced fepstrum; an improved modulation spectrum estimation technique that overcomes certain theoretical as well as practical shortcomings in the previously published modulation spectrum related techniques [11, 13, 14]. In [3], we have also shown that fepstrum is an exact dual of the well known quantity, real cepstrum. In this paper, we provide further extensive ASR results using the fepstrum features over the TIMIT core test-set phoneme recognition task using a triphone context dependent HMM recognizer. Moreover, fepstrum performance is rigorously benchmarked against a competitive MFCC baseline, other best results reported on the same task [7, 8, 9] and a heterogeneous and multiple classifier based technique [5]. In our experiments, a simple concatenation of fepstrum and MFCC composite feature is used to train a conventional hidden Markov model Gaussian mixture model (HMM-GMM) recognizer. This composite feature achieves a phoneme recognition accuracy of 74.6% on the TIMIT core test-set which is 1.8% absolute better than the MFCC HMM-GMM recognizer accuracy of 72.8%.

Full Paper

Bibliographic reference.  Tyagi, Vivek (2007): "Fepstrum: an improved modulation spectrum for ASR", In INTERSPEECH-2007, 1114-1117.