8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

ASR on Speech Reconstructed from Short-time Fourier Phase Spectra

Leigh David Alsteris, Kuldip K. Paliwal

Griffith University, Australia

In our earlier papers, we have measured human intelligibility of speech stimuli reconstructed either from the short-time magnitude spectra (magnitude-only stimuli) or the short-time phase spectra (phase-only stimuli) of a speech stimulus. We demonstrated that, even for small analysis window durations of 20-40 ms (of relevance to automatic speech recognition), the short-time phase spectrum can contribute to speech intelligibility as much as the short-time magnitude spectrum. In this paper, we perform automatic speech recognition on magnitude-only and phase-only stimuli. When employing an MFCC-based front-end, the recognition achieved for these phase-only stimuli is much worse than magnitude-only stimuli at small analysis window durations, which is not consistent with their corresponding human intelligibility results. This implies that the MFCC feature set is not capturing all of the discriminating information present in the speech signal.

Full Paper

Bibliographic reference.  Alsteris, Leigh David / Paliwal, Kuldip K. (2004): "ASR on speech reconstructed from short-time fourier phase spectra", In INTERSPEECH-2004, 565-568.