8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

A Pair-Based Language Model for the Robust Lexical Analysis in Chinese Text-to-Speech Synthesis

Wu Liu, Dezhi Huang, Yuan Dong, Xinnian Mao, Haila Wang

France Telecom R&D Beijing, China

This paper presents a robust method of lexical analysis for Chinese text-to-speech (TTS) synthesis using a pair-based Language Model (LM). The traditional way of Chinese lexical analysis simply regards the word segmentation and part-of-speech (POS) tagging as two separated phases. Each of them utilizes its own algorithms and models. Actually, the POS information is useful for word segmentation, and vice versa. Therefore, a pair-based language model is proposed to integrate basic word segmentation, POS tagging and named entity (NE) identification into a unified framework. The objective evaluation indicates that the proposed method achieves the top-level performance, and confirms its effectiveness in Chinese lexical analysis.

Full Paper

Bibliographic reference.  Liu, Wu / Huang, Dezhi / Dong, Yuan / Mao, Xinnian / Wang, Haila (2007): "A pair-based language model for the robust lexical analysis in Chinese text-to-speech synthesis", In INTERSPEECH-2007, 1905-1908.