Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Segment Selection in the L&H RealSpeak Laboratory TTS System

Geert Coorman, Justin Fackrell, Peter Rutten, Bert Van Coile

Lernout & Hauspie Speech Products NV, Flanders Language Valley, Ieper, Belgium

The L&H RealSpeak Laboratory TTS (RSLab) system is a corpus based speech synthesis system comprising components that deal with linguistic processing, prosody prediction, segment selection, concatenation and modification. In this paper we focus on the segment selection process. During segment selection, the units in a large database of speech are scored with a cost according to their prosodic/phonetic mismatch with the target description of the utterance to be synthesized. This prosodic/phonetic cost is computed on the basis of a combination of symbolic and numeric features. The candidate units from the speech database are then evaluated for the ease with which they can be concatenated. A dynamic programming algorithm, using additive costs, is used to find the optimal path of candidates to represent the spoken utterance. The chosen segments are then concatenated in the time domain to yield a smooth-sounding speech signal, with natural-sounding prosody. One of the keys to the success of the segment selection component is the context dependent choice of cost functions, and the method of combining the costs from the various features. The RSLab system makes use of a family of complex cost functions that allows linguistic and perceptual knowledge to be incorporated in the segment selection process.

