Interspeech'2005 - Eurospeech
This paper presents a new approach to unit selection for corpusbased speech synthesis, in which the units are selected according to acoustic criteria. In a learning stage, an acoustic clustering is carried out using context dependent HMM. During synthesis, an acoustic target is generated and segmented in the required diphone sequence. For each diphone to be synthesized, a pre-selection module determines the N-best instances that match this acoustic target. From these candidates, the optimal unit sequence is then obtained by minimizing a concatenation cost through dynamic programming. Objective as well as subjective tests are carried out which shows the relevance of the proposed method.
Bibliographic reference. Rouibia, Soufiane / Rosec, Olivier (2005): "Unit selection for speech synthesis based on a new acoustic target cost", In INTERSPEECH-2005, 2565-2568.