INTERSPEECH 2004 - ICSLP
Corpus based Concatenative Text-To-Speech (CTTS) systems have been proven a successful method to produce good voice quality speech. However, It requires a large inventory of synthesis segments and complex search algorithms, which sometimes hinder the usability of CTTS. Segment pre-selection targets to prune the candidate segments to achieve the best possible synthesis quality within a predefined inventory size. Making CTTS usable in environments where memory and CPU are critically constrained. This paper presents a novel pre-selection method in which Mutual Information (MI), a well-known concept in statistics, is integrated. Objective and subjective evaluations of the synthesized speech have proven that this new approach out-performs two conventional pre-selection methods popularly used in current CTTS systems.
Bibliographic reference. Zhang, Wei / Jin, Ling / Ma, Xijun (2004): "Mutual-information based segment pre-selection in concatenative text-to-speech", In INTERSPEECH-2004, 1389-1392.