The performance of automatic speech recognizers (ASR) typically degrades for test speakers with "outlier" characteristics, for example, speakers with foreign accent and fast speaking rate. In this work, we concentrate on the latter. Consistent with other researchers, we have observed that for speakers with exceptionally high speaking rate, the word recognition error is significantly higher. We have investigated two possible causes for this effect. Inherent spectral differences may cause the extracted features for these outliers to be significantly different from that of normal speech. Also, due to phone omissions and duration reduction, the normal word-models may not be suitable for fast speech. Based on our exploratory experiments on TIMIT and WSJ corpora, we believe the spectral differences and duration reduction are both significant sources of the increased error. By adapting our MLP phonetic probability estimator to fast speech, and employing fast speaker word-models, we have been able to eliminate about 16% of the fast speaker word recognition errors.
Bibliographic reference. Mirghafori, Nikki / Foster, Eric / Morgan, Nelson (1995): "Fast speakers in large vocabulary continuous speech recognition: analysis & antidotes", In EUROSPEECH-1995, 491-494.