This paper presents a robust method of lexical analysis for Chinese text-to-speech (TTS) synthesis using a pair-based Language Model (LM). The traditional way of Chinese lexical analysis simply regards the word segmentation and part-of-speech (POS) tagging as two separated phases. Each of them utilizes its own algorithms and models. Actually, the POS information is useful for word segmentation, and vice versa. Therefore, a pair-based language model is proposed to integrate basic word segmentation, POS tagging and named entity (NE) identification into a unified framework. The objective evaluation indicates that the proposed method achieves the top-level performance, and confirms its effectiveness in Chinese lexical analysis.
Bibliographic reference. Liu, Wu / Huang, Dezhi / Dong, Yuan / Mao, Xinnian / Wang, Haila (2007): "A pair-based language model for the robust lexical analysis in Chinese text-to-speech synthesis", In INTERSPEECH-2007, 1905-1908.