September 22-25, 1997
Continuous speech is far more natural and efficient than isolated speech for communication. However, for current state-of-the-art of automatic speech recognition systems, isolated speech recognition (ISR) is far more accurate than continuous speech recognition (CSR). It is a common practice in the speech research community to build CSR systems using only CSR data. In doing this we ignore the fact that isolated (a.k.a. discrete) speech is a special case of continuous speech. A slowing of the speaking rate is a natural reaction for a user faced with the high error rates of current CSR systems. Ironically, CSR systems typically have a much higher word error rate when speakers slow down since the acoustic models are usually derived exclusively from continuous speech corpora. In this paper, we summarize our efforts to improve the robustness of our speaker-independent CSR system without suffering a recognition accuracy penalty. In particular the multi-style trained system described in this paper attains a 7.0% word error rate for a test set consisting of both isolated and continuous speech, in contrast to the 10.9% word error rate achieved by the same system trained only on continuous speech.
Bibliographic reference. Alleva, Fil / Huang, Xuedong / Hwang, Mei-Yuh / Jiang, Li (1997): "Can continuous speech recognizers handle isolated speech?", In EUROSPEECH-1997, 911-914.