4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

An Evaluation of Statistical Language Modeling for Speech Recognition using a Mixed Category of Both Words and Parts-of-Speech

Yumi Wakita, Jun Kawai, Hitoshi Iida

ATR Interpreting Telecommunications Research Laboratories, Soraku-gun, Kyoto, Japan

In our previous paper, we proposed a mixed category of words and parts-of-speech names the MWP category based on class N-gram modeling [1]. However, we had not confirmed the efficiency of MWP category. In this paper, we evaluate the proposed MWP category. At first we use \coverage of words and category sequences to open data" and \perplexity to training data" for the evaluation and we confirmed the characteristics of parts-of-speech are useful to for generating a suitable class N-gram modeling. As a result of the speech recognition experimentation, we also confirmed that the class N-gram modeling using MWP category is effective in improving the recognition rate for open data that shows a low coverage of words and category sequences, without decreasing the recognition rate much for closed data.

Full Paper

Bibliographic reference.  Wakita, Yumi / Kawai, Jun / Iida, Hitoshi (1996): "An evaluation of statistical language modeling for speech recognition using a mixed category of both words and parts-of-speech", In ICSLP-1996, 530-533.