Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
This paper discusses an optimal method to decide phonemic sequence using frame-level likelihood of phonemes and the statistics of their duration and connectivity. In the results of phoneme recognition in continuous speech, there are often many deletion and insertion errors. Therefore, it is important to reduce such errors to realize highly advanced continuous speech recognition system. Our algorithm is based on DP method. The duration of each phoneme is expressed by stochastic model, and the connectivity of phonemes is determined on the basis of phone tical and phonological knowledge. Furthermore, we also propose its application for word detection using a method to decide phrase boundary by prosodic information. As the result, the performance of the speaker dependent recognition is improved to 94.6% (3.7% insertions, 3.5% deletions) for word utterance and 67.1% (17.7%, 16.4%) for sentence utterance, respectively. And the performance of word detection is 69.0% for independent words. These scores are much better than those obtained in our previous system.
Bibliographic reference. Shirai, Katsuhiko / Okawa, Shigeki / Kobayashi, Tetsunori (1992): "Phoneme recognition in continuous speech based on mutual information considering phonemic duration and connectivity", In ICSLP-1992, 1479-1482.