In our previous work [1, 2], we have introduced Fepstrum - an improved modulation spectrum estimation technique that overcomes certain theoretical as well as practical shortcomings in the previously published modulation spectrum related techniques[7, 8, 9]. In this paper, we provide further extensive ASR results using the Tandem processed Fepstrum features over the TIMIT corpus. The results are compared with TRAPS features derived from hierarchical and parallel structures of neural networks. Unlike the multiple neural networks trained over multiple timefrequency patches or the frequency bands as in , we train a single neural network with the concatenated Fepstrum and MFCC features to derive Tandem(Fepstrum+MFCC) features. The resultant phoneme recognition accuracy of the concatenated Tandem(Fepstrum+MFCC)+MFCC feature is 76.5% on the TIMIT core test set and 77.6% on the complete test set making these one of the best reported results on the TIMIT continuous phoneme recognition task.
Bibliographic reference. Tyagi, Vivek (2008): "Tandem processing of fepstrum features", In INTERSPEECH-2008, 2246-2249.