8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Learning for Transliteration of Arabic-Numeral Expressions using Decision Tree for Korean TTS

HyeonSook Nam (1), Youngim Jung (2), Donghun Lee (2), Hyuk-chul Kwon (2), Aesun Yoon (2)

(1) Busan Digital University, Korea
(2) Pusan National University, Korea

Despite of much work on TTS technologies and several TTS systems customized for Korean, current TTS systems output many errors in transliterating the sounds of non-alphabetic symbols such as Arabic numerals and text symbols. This paper proposes TLAN (Transliteration learner for Arabic-Numeral Expressions(NEs)) which can efficiently disambiguate the reading and meaning of NEs in texts by using a decision tree. For the purpose of analyzing and learning data, three phases of learning elements were suggested: patterns of Arabic numerals combined with text symbols, contextual features and heuristic information were classified according to the senses and sounds of NEs. Our corpus was made up of news articles issued from January 1st, 2000 to December 31st, 2001 from 9 major newspapers in Korea. By learning the three phases of learning elements, the model shows 97.38% and 97.28% accuracies for the training set and the test set, respectively.

Full Paper

Bibliographic reference.  Nam, HyeonSook / Jung, Youngim / Lee, Donghun / Kwon, Hyuk-chul / Yoon, Aesun (2004): "Learning for transliteration of arabic-numeral expressions using decision tree for Korean TTS", In INTERSPEECH-2004, 1937-1940.