Interspeech'2005 - Eurospeech
Many applications of TTS incorporate both unpredictable words, which require the flexibility of TTS, and static phrases, for which the quality of recorded speech is unmatched by TTS. "Phrase-splicing" TTS attempts to provide the optimal combination of the two, by customizing concatenative TTS to such applications by incorporating application-specific recordings at the word or phrase level while resorting to smaller-unit synthesis to fill the gaps not covered by those recordings. In the past, we have achieved this by using a word-level search on the application-specific recordings followed by a general-purpose TTS search, in our case using sub-phonetic units, to fill the gaps. However, recent trends toward larger-unit roles in general-purpose TTS suggest a single-search approach for phrase splicing. A listening test shows that we achieve at least as high quality with the new one-search algorithm as with two-search.
Bibliographic reference. Hamza, Wael / Pitrelli, John F. (2005): "Combining the flexibility of speech synthesis with the naturalness of pre-recorded audio: a comparison of two approaches to phrase-splicing TTS", In INTERSPEECH-2005, 2585-2588.