8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

A PLSA-Based Language Model for Conversational Telephone Speech

David Mrva, Philip C. Woodland

Cambridge University, UK

This paper describes experiments with a PLSA-based language model for conversational telephone speech. This model uses a long-range history and exploits topic information in the test text to adjust probabilities of test words. The PLSA-based model was found to lower test set perplexity over a traditional word+class-based 4-gram by 13% (optimistic estimate using a reference transcript as history) or by 6% (realistic estimate using recognised transcript as history). Moreover, this paper introduces a use of confidence scores to weight words in the history, a weight of the prior topic distribution and a way of calculating perplexity that accounts for recognition errors in the model context.

Full Paper

Bibliographic reference.  Mrva, David / Woodland, Philip C. (2004): "A PLSA-based language model for conversational telephone speech", In INTERSPEECH-2004, 2257-2260.