ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

An Assessment of Automatic Recognition Techniques for Spontaneous Speech in Comparison with Human Performance

Takahiro Shinozaki, Sadaoki Furui

Department of Computer Science, Tokyo Institute of Technology, Japan

To investigate problems of spontaneous speech recognition using N-grams and HMMs and estimate the room for improvement in the recognition rate, an automatic speech recognizer is evaluated in comparison with performances by human listeners. The evaluation task is to recognize spontaneous speech presentations from the Corpus of Spontaneous Japanese. Both the automatic recognizer and human listeners are requested to choose the most likely word from a dictionary, given a speech signal with a three word length including one word context extracted from a presentation. Recognition performances are compared using the same criteria for both experiments. The results show that recognition error rate by human listeners is roughly half of that by the recognizer. By examining words that are easy for humans but difficult for the recognizer, it is found that causes of the recognition errors by the decoder include insufficiency of model accuracy and lack of robustness against vague and variable pronunciations.

Full Paper

Bibliographic reference.  Shinozaki, Takahiro / Furui, Sadaoki (2003): "An assessment of automatic recognition techniques for spontaneous speech in comparison with human performance", in SSPR-2003, paper MAP15.