ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
To investigate problems of spontaneous speech recognition using N-grams and HMMs and estimate the room for improvement in the recognition rate, an automatic speech recognizer is evaluated in comparison with performances by human listeners. The evaluation task is to recognize spontaneous speech presentations from the Corpus of Spontaneous Japanese. Both the automatic recognizer and human listeners are requested to choose the most likely word from a dictionary, given a speech signal with a three word length including ± one word context extracted from a presentation. Recognition performances are compared using the same criteria for both experiments. The results show that recognition error rate by human listeners is roughly half of that by the recognizer. By examining words that are easy for humans but difficult for the recognizer, it is found that causes of the recognition errors by the decoder include insufficiency of model accuracy and lack of robustness against vague and variable pronunciations.
Bibliographic reference. Shinozaki, Takahiro / Furui, Sadaoki (2003): "An assessment of automatic recognition techniques for spontaneous speech in comparison with human performance", in SSPR-2003, paper MAP15.