7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper presents an evaluation of the use of some auditory-based acoustic distinctive features and formant cues for automatic speech recognition (ASR). Comparative experiments have indicated that the use of either the formant magnitudes or the formant frequencies combined with some auditory-based acoustic distinctive features and the classical MFCCs within a multi-stream statistical framework leads to an improvement in the recognition performance of HMM- based ASR systems. The Hidden Markov Model Toolkit (HTK) was used throughout our experiments to test the use of the new multi-stream feature vector. A series of experiments on speakerindependent continuous-speech recognition have been carried out using a subset of the large read-speech corpus TIMIT. Using such multi-stream paradigm, N-mixture tri-phone models and a bigram language model, we found that the word error rate was decreased by about 6.46%.
Bibliographic reference. Tolba, Hesham / Selouani, Sid-Ahmed / O’Shaughnessy, Douglas (2002): "Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for automatic speech recognition using a multi-stream paradigm", In ICSLP-2002, 2113-2116.