5th International Conference on Spoken Language Processing
This paper describes a thesaurus-based class n-gram model for broadcast news transcription. The most important issue concerned with class n-gram models is how to develop a word classification. We construct a word classification mapping based on a thesaurus so as to maximize the average mutual information function on a training corpus. To examine the effectiveness of the new method, we compare it with two our previous methods, in which the same thesaurus is used but word-class mappings are determined in the different manners. The new method achieved substantially lower perplexity for 83 news transcription sentences broadcast on June 4, 1996.
Bibliographic reference. Ando, Akio / Kobayashi, Akio / Imai, Toru (1998): "A thesaurus-based statistical language model for broadcast news transcription", In ICSLP-1998, paper 0016.