Sixth European Conference on Speech Communication and Technology
In this paper, the speech understanding problem in the context of a spoken dialog system is formalized in a maximum likelihood framework. Word and dialog-state n-grams are used for building categorical understanding and dialog models, respectively. Acoustic confidence scores are incorporated in the understanding formulation. Problems due to data sparseness and out-of-vocabulary words are discussed. Incorporating dialog models reduces relative understanding error rate by 1525%, while acoustic confidence scores achieve a further 10% error reduction for a computer gaming application.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Potamianos, Alexandros / Riccardi, Giuseppe / Narayanan, Shrikanth (1999): "Categorical understanding using statistical ngram models", In EUROSPEECH'99, 2027-2030.