INTERSPEECH 2004 - ICSLP
When adapting an existing ASR-application for different user environments, one often gets confronted with speech that does not entirely match the training situation. Differences may stem both from acoustic and linguistic causes. In this paper we explore to what extent the word correct rate (WCR) for a given test set can be predicted from the transcription only (i.e. the linguistic representation) under the assumption that acoustic conditions are matched. We hope that, eventually, such a prediction can provide an estimate of a lower bound on WER to aim for when applying acoustic enhancement techniques. In this paper, we propose and compute measures for acoustic and linguistic confusability (AC and LC) of each entry in the vocabulary of an ASR engine. Using a tabulation of how correctness of actual recognition on a development set varies as a function of these confusability measures, we show that actually observed WCR of words from independent test sets can be predicted with high accuracy over the full ranges of AC and LC levels.
Bibliographic reference. Bouwman, Gies / Cranen, Bert / Boves, Lou (2004): "Predicting word correct rate from acoustic and linguistic confusability", In INTERSPEECH-2004, 1481-1484.