Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Learning Methods and Features for Corpus-Based Phrase Break Prediction on Thai

C. Hansakunbuntheung, Ausdang Thangthai, Chai Wutiwiwatchai, Rungkarn Siricharoenchai

NECTEC, Thailand

This paper presents applications of five famous learning methods for Thai phrase break prediction. Phrase break prediction is particularly important for our Thai text-to-speech synthesizer (TTS), where input Thai text has no word and sentence boundary. The learning methods include a POS sequence model, CART, RIPPER, SLIPPER and neural network. Features proposed for the learning machines can be extracted directly from the input text during real processing. The best method based on the CART model gives 80.14% correct-break, 94.40% juncture-correct, and 2.37% false-break scores. Comparing to our previous models based on C4.5 and RIPPER, the new optimized method achieves almost the best performance.

Full Paper

Bibliographic reference.  Hansakunbuntheung, C. / Thangthai, Ausdang / Wutiwiwatchai, Chai / Siricharoenchai, Rungkarn (2005): "Learning methods and features for corpus-based phrase break prediction on Thai", In INTERSPEECH-2005, 1969-1972.