8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

A PLSA-Based Language Model for Conversational Telephone Speech

David Mrva, Philip C. Woodland

Cambridge University, UK

This paper describes experiments with a PLSA-based language model for conversational telephone speech. This model uses a long-range history and exploits topic information in the test text to adjust probabilities of test words. The PLSA-based model was found to lower test set perplexity over a traditional word+class-based 4-gram by 13% (optimistic estimate using a reference transcript as history) or by 6% (realistic estimate using recognised transcript as history). Moreover, this paper introduces a use of confidence scores to weight words in the history, a weight of the prior topic distribution and a way of calculating perplexity that accounts for recognition errors in the model context.

