![]() |
ASR2000 - Automatic Speech Recognition: Challenges for the new MilleniumSeptember 18-20, 2000 |
![]() |
Continuous Speech Recognition (CSR) systems require a Language Model (LM) to represent the syntactic constraints of the language. A sub-class of the regular languages, the k Testable in the Strict Sense (k-TSS) languages, has been used to generate LMs. Then, a smoothing technique needs to be applied to also consider events not represented in the training corpus. In this work, a new syntactic backing off smoothing approach, the Delimited discounting, was applied to several pruned and no pruned k-TSS LMs. Delimited discounting deals with the Turing discounting problems while keeping the Katz’ smoothing schema. The experimental evaluation was carried out over a Spanish speech application task, showing that an increase of the test set perplexity of a LM does not always mean a degradation in the model performance when integrated in a CSR system. Besides, there is a strong dependence between the amount of probability reserved by the smoothing technique to be assigned to unseen events and the value of the balance parameter applied to the LM probabilities in the Bayes´s rule needed to get the best system performance.
Full Paper (PDF) Full Paper (Zipped Postscript)
Bibliographic reference. Varona, A. / Torres, I. (2000): "Delimited smoothing technique over pruned and not pruned syntactic language models: perplexity and WER", In ASR-2000, 69-76.