ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Korean large vocabulary continuous speech recognition using pseudomorpheme units

Oh-Wook Kwon, Kyuwoong Hwang, Jun Park

This paper presents a Korean large vocabulary continuous speech recognition system based on pseudomorpheme units. In Korean, an eojeol (word phrase) is a unit for spacing and a morpheme is the smallest unit with semantic meaning. If the eojeol is used as the dictionary and language modeling unit, the number of the unit becomes enormous. Instead we propose to use modified morpheme or pseudomorpheme as the basic recognition unit. We can recover the original eojeol by concatenating graphemes of pseudomorpheme components. We used a dictionary and language model with pseudomorpheme/part-of-speech entries where each entry can have multiple pronunciations according to the morphology rule. With 32k-word vocabulary, the speaker-independent character, pseudomorpheme, and eojeol recognition accuracies on economy article database were 90.8%, 84.5%, and 81.3%, respectively.

Keywords: continuous speech recognition


doi: 10.21437/Eurospeech.1999-124

Cite as: Kwon, O.-W., Hwang, K., Park, J. (1999) Korean large vocabulary continuous speech recognition using pseudomorpheme units. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 483-486, doi: 10.21437/Eurospeech.1999-124

@inproceedings{kwon99_eurospeech,
  author={Oh-Wook Kwon and Kyuwoong Hwang and Jun Park},
  title={{Korean large vocabulary continuous speech recognition using pseudomorpheme units}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={483--486},
  doi={10.21437/Eurospeech.1999-124}
}