8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Predicting Word Correct Rate from Acoustic and Linguistic Confusability

Gies Bouwman, Bert Cranen, Lou Boves

Radboud University Nijmegen, Netherlands

When adapting an existing ASR-application for different user environments, one often gets confronted with speech that does not entirely match the training situation. Differences may stem both from acoustic and linguistic causes. In this paper we explore to what extent the word correct rate (WCR) for a given test set can be predicted from the transcription only (i.e. the linguistic representation) under the assumption that acoustic conditions are matched. We hope that, eventually, such a prediction can provide an estimate of a lower bound on WER to aim for when applying acoustic enhancement techniques. In this paper, we propose and compute measures for acoustic and linguistic confusability (AC and LC) of each entry in the vocabulary of an ASR engine. Using a tabulation of how correctness of actual recognition on a development set varies as a function of these confusability measures, we show that actually observed WCR of words from independent test sets can be predicted with high accuracy over the full ranges of AC and LC levels.

Full Paper

Bibliographic reference.  Bouwman, Gies / Cranen, Bert / Boves, Lou (2004): "Predicting word correct rate from acoustic and linguistic confusability", In INTERSPEECH-2004, 1481-1484.