September 22-25, 1997
This paper describes the Korean speech corpus for train ticket reservation aid system based on speech recognition. Two sets of speech corpus were collected. One was based on human-human(H-H) dia- logues and the other was based on human-computer(H- C) dialogues. WOZ(Wizard of Oz) experiment was carried out to collect speech corpus based on H-C spoken dialogue. A total of 298 speaker data was col- lected for H-C corpus and a total of 100 speaker data was collected for H-H corpus. Since the basic unit of grammar in Korean is a morpheme, Korean-language model based on a morpheme was designed in addition to a word-based language model. Linguistic analysis results show that people respond differently when they are talking to a computer compared to when talking to a human. Also language-model analysis results reveal that a morpheme-based language model gives 50% reduction in perplexity(PP) over a word-based one.
Bibliographic reference. Kim, Woosung / Koo, Myoung-Wan (1997): "A Korean speech corpus for train ticket reservation aid system based on speech recognition", In EUROSPEECH-1997, 1723-1726.