5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Speaker-Independent Speech Recognition Using Micro Segment Spectrum Integration

Kiyoaki Aikawa

NTT Human Interface Laboratories, Speech and Acoustics Laboratories, Japan

This paper proposes a new spectral estimation method for automatic speech recognition. The spectrum estimated with the conventional data window of around 30 ms shows harmonic structure in the voiced portions of speech data. The harmonic frequency interval is often comparable to the formant frequency interval for female voices with high F0, which results in spectral estimation error. The new idea is to estimate spectrum by taking the Lp norm of the time series of the spectrum obtained from a very short speech segment. The new method, called the micro-segment spectrum integration, provides (1) precise spectral estimation not affected by harmonic structure, and (2) noise-robustness by suppressing noisy speech segments. Phoneme recognition experiments demonstrate that the micro-segment spectrum integration method outperforms conventional spectral estimation methods.

Full Paper

Bibliographic reference.  Aikawa, Kiyoaki (1998): "Speaker-independent speech recognition using micro segment spectrum integration", In ICSLP-1998, paper 0262.