In this paper, we propose a PLSA-based language model for sports live speech. This model is implemented in unigram rescaling technique that combines a topic model and an n-gram. In conventional method, unigram rescaling is performed with a topic distribution estimated from a history of recognized transcription. This method can improve the performance; however it cannot express topic transition. Incorporating concept of topic transition, it is expected to improve the recognition performance. Thus the proposed method employs a "Topic HMM" instead of a history to estimate the topic distribution. The Topic HMM is a Discrete Ergodic HMM that expresses typical topic distributions and topic transition probabilities. Word accuracy results indicate an improvement over tri-gram and PLSA-based conventional method using a recognized history.
Bibliographic reference. Sako, Atsushi / Takiguchi, Tetsuya / Ariki, Yasuo (2007): "Language modeling using PLSA-based topic HMM", In INTERSPEECH-2007, 606-609.