In this paper, the speech understanding problem in the context of a spoken dialog system is formalized in a maximum likelihood framework. Word and dialog-state n-grams are used for building categorical understanding and dialog models, respectively. Acoustic confidence scores are incorporated in the understanding formulation. Problems due to data sparseness and out-of-vocabulary words are discussed. Incorporating dialog models reduces relative understanding error rate by 1525%, while acoustic confidence scores achieve a further 10% error reduction for a computer gaming application.
Cite as: Potamianos, A., Riccardi, G., Narayanan, S. (1999) Categorical understanding using statistical ngram models. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2027-2030, doi: 10.21437/Eurospeech.1999-448
@inproceedings{potamianos99b_eurospeech, author={Alexandros Potamianos and Giuseppe Riccardi and Shrikanth Narayanan}, title={{Categorical understanding using statistical ngram models}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2027--2030}, doi={10.21437/Eurospeech.1999-448} }