Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Multiple-Level Evaluation of Speech Recognition Systems

John F. Pitrelli, David Lubensky, Benjamin Chigier, Hong C. Leung

Speech Technology Group, Artificial Intelligence Laboratory, NYNEX Science & Technology, Inc., White Plains, NY, USA

Evaluations of speech recognizers typically focus on somewhat idealized versions of the types of utterances the recognizer would confront in a real application. One issue which our group has discussed previously is the use of laboratory speech rather than real-user speech, and the resulting over-optimistic projections of performance. This paper focuses on another problem: typical evaluations often exclude some or all classes of utterances which would occur in a real application but do not precisely match the type of input for which the recognizer was designed. Some example categories are utterances which include excess words not in the recognizer's vocabulary (non-target speech), utterances which lack target speech, and "utterances" lacking any speech at all. Such inputs to a recognizer may result from non-compliant users, or from pre-processing errors such as imperfect endpointing. We propose a more comprehensive evaluation strategy, using as an example an evaluation of a recognition system prototype for a city-name-recognition application. Our strategy is designed to meet two goals - to evaluate automation potential realistically, and to provide diagnostic information to pinpoint directions for future work on the system. To these ends, our evaluation treats both the overall system and the individual component modules within it. We learn that a surprisingly wide variety of "recognition rates" can meaningfully describe the accuracy of a system or portions of it. Consequently, accuracy statistics must be inteipreted and/or compared with extreme care.

