EUROSPEECH 2003 - INTERSPEECH 2003
Speech unit selection algorithms have the task to find a single sequence of speech units that optimally fit the target transcription of an utterance that must be synthesized. In doing so, these algorithms ignore a very large number of possible alternative unit sequences that lead to alternative renderings of that utterance. In this paper we set out to explore these alternative unit sequences - by introducing interactive unit selection.
Interactive unit selection is based on feedback of a listener. To collect this feedback we implement two levels of control: an elaborate GUI, and a simple XML tag mechanism. The GUI offers access to unit selection with a granularity of a single speech unit, and allows a user to set prosodic constraints for the selection of alternative speech units. The XML tag mechanism operates on words, and allows the user to request an nth-best alternative selection.
Results show that interactive unit selection succeeds in correcting most of the synthesis problems that occur in our default synthesis system, providing very detailed information that can be used to improve our run-time algorithms. This work not only provides a powerful research tool, it also leads to a number of commercial applications. The GUI can be used efficiently to improve speech synthesis off-line - to the extent that it eliminates the need to make special recordings for domain specific applications. The XML tag, on the other hand, can be used to quickly optimize the output of the system.
Bibliographic reference. Rutten, Peter / Fackrell, Justin (2003): "The application of interactive speech unit selection in TTS systems", In EUROSPEECH-2003, 285-288.