Sixth European Conference on Speech Communication and Technology
Though Perplexity shows good correlation with word error rate within simple n-gram framework like Wall Street Journal task, it has been reported that perplexity have poor correlation with WER when more complicated LMis used. In this paper, a global measure for language model evaluation is proposed which achieves higher correlation between word accuracy. The metric is based on difference of LM score between a word in the evaluation text and the word that gives the maximum score at that context. Two experiments were carried out to investigate the correlation between word accuracy and the proposed measure.In the first experiment, LMs in this paper were created using n-gram adaptation by n-gram count mixture. 47 LMs were created for the experiments by changing mixture weight and vo-cabulary cut-off threshold. Correlation betwen perplexity and word accuracy was very poor (correlation coefficient -0.36). On the other hand, the proposed metric gave much higher correlation (correlation coefficient 0.82). In the second experiment, a simple mixture trigram model was employed to recognize Switchboard task data. The highest correlation between word accuracy and the proposed method was 0.81, which was much higher than the correla-tion between PP and accucary 0.59.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Ito, Akinori / Kohda, Masaki / Ostendorf, Mari (1999): "A new metric for stochastic language model evaluation", In EUROSPEECH'99, 1591-1594.