Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Improving the Representation of Time Structure in Front-Ends for Automatic Speech Recognition

Wendy J. Holmes

20/20 Speech Ltd., Malvern Hills Science Park, Malvern, Worcs., UK

This paper describes investigations into the use of ‘excitation-synchronous’ spectral analysis to provide acoustic features for automatic speech recognition. Within each 10 ms frame the region of maximum power is located and used as the centre for the window in a subsequent Fourier transform. The method has been found to be effective in locating stop bursts and vocal-tract responses to glottal closures. This excitation-synchronous analysis has been compared with the more conventional fixed-interval analysis for window lengths ranging from 5 to 25 ms. In connected-digit recognition experiments using mel-cepstrum features, the excitation-synchronous analysis with a window length of 10 ms gave a 10% improvement in recognition performance when compared with the best of the fixed-window conditions.


Full Paper

Bibliographic reference.  Holmes, Wendy J. (2000): "Improving the representation of time structure in front-ends for automatic speech recognition", In ICSLP-2000, vol.2, 1073-1076.