Sixth International Conference on Spoken Language Processing
In this paper, we describe problems in recognizing large-vocabulary Korean continuous speech, and proposed solutions to them. Korean sentences consist of eojeols, which are separated by spaces in text and consist of morphemes. When we use morpheme units, there are many word insertion and deletion errors because morpheme units are too short. We introduce a between-word phone variation lexicon that can represent many alternatives of phones of words in one structure. The decoding algorithm is composed of one pass, which is a modification of token-passing algorithm. In this algorithm, we allowed multiple tokens in a state at a time to get global best path without expanding the states when we use trigram language models. We confirmed that between-word phone variation lexicon is useful for morpheme-based recognition by observing that the improvement is higher for morpheme units than for eojeol units. Allowing multiple tokens at a state also improved the performance.
Bibliographic reference. Yu, Ha-Jin / Kim, Hoon / Hong, Joon-Mo / Kim, Min-Seong / Lee, Jong-Seok (2000): "Large vocabulary Korean continuous speech recognition using a one-pass algorithm", In ICSLP-2000, vol.4, 278-281.