5th International Conference on Spoken Language Processing
This paper reports our work to improve a bigram language model for Japanese TV broadcast news speech recognition. First, frequent word strings were grouped into phrases in order that the phrases were added to the lexicon as new units of recognition. The test set perplexity was improved when frequent function word strings were used as additional recognition units. The speech recognition performance was improved both by grouping function word strings and by grouping compound nouns that were selected by word association ratio. Secondly, in order to alleviate the OOV problem related with nouns, we built and tested a language model that allows switching its noun lexicon according to the domain of the article to be recognized next.
Bibliographic reference. Takagi, Kazuyuki / Oguro, Rei / Hashimoto, Kenji / Ozeki, Kazuhiko (1998): "Performance evaluation of word phrase and noun category language models for broadcast news speech recognition", In ICSLP-1998, paper 0026.