8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Mutual-Information Based Segment Pre-selection in Concatenative Text-to-Speech

Wei Zhang, Ling Jin, Xijun Ma

IBM China Research Lab., China

Corpus based Concatenative Text-To-Speech (CTTS) systems have been proven a successful method to produce good voice quality speech. However, It requires a large inventory of synthesis segments and complex search algorithms, which sometimes hinder the usability of CTTS. Segment pre-selection targets to prune the candidate segments to achieve the best possible synthesis quality within a predefined inventory size. Making CTTS usable in environments where memory and CPU are critically constrained. This paper presents a novel pre-selection method in which Mutual Information (MI), a well-known concept in statistics, is integrated. Objective and subjective evaluations of the synthesized speech have proven that this new approach out-performs two conventional pre-selection methods popularly used in current CTTS systems.

Full Paper

Bibliographic reference.  Zhang, Wei / Jin, Ling / Ma, Xijun (2004): "Mutual-information based segment pre-selection in concatenative text-to-speech", In INTERSPEECH-2004, 1389-1392.