INTERSPEECH 2004 - ICSLP
Automatic recognition of speech by machines begins with extraction of meaningful features from the speech signal. Conventional features like the MFCC are derived from the Fourier transform magnitude spectrum, while totally ignoring the phase spectrum. The importance of the Modified group delay feature (MODGDF) derived from the Fourier transform phase spectrum for speaker and phoneme recognition has been presented in our previous efforts. In this paper we try to analyse the feature theoretically and provide justifications in terms of de-correlation, robustness to convolutional and white noise, cluster structures, separability in lower dimensional space, task independence and class separability. The results of speaker identification and continuous speech recognition using the MODGDF as the front end are also presented. Joint features derived from the MODGDF and MFCC gave significant improvements in recognition performance for both speaker and continuous speech recognition tasks. Using the analytical results in the first half of the paper and the results of performance evaluation in the second half, the MODGDF is proposed as an alternative spectral representation of speech.
Bibliographic reference. Murthy, Hema A. / Hegde, Rajesh Mahanand / Gadde, Venkata Ramana Rao (2004): "The modified group delay feature: a new spectral representation of speech", In INTERSPEECH-2004, 913-916.