Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Segment Selection in the L&H RealSpeak Laboratory TTS System
Geert Coorman, Justin Fackrell, Peter Rutten, Bert Van Coile
Lernout & Hauspie Speech Products NV,
Flanders Language Valley, Ieper, Belgium
The L&H RealSpeak Laboratory TTS (RSLab) system is a
corpus based speech synthesis system comprising components
that deal with linguistic processing, prosody prediction, segment
selection, concatenation and modification. In this paper we focus
on the segment selection process.
During segment selection, the units in a large database of
speech are scored with a cost according to their
prosodic/phonetic mismatch with the target description of the
utterance to be synthesized. This prosodic/phonetic cost is
computed on the basis of a combination of symbolic and
numeric features. The candidate units from the speech database
are then evaluated for the ease with which they can be
concatenated. A dynamic programming algorithm, using additive
costs, is used to find the optimal path of candidates to represent
the spoken utterance. The chosen segments are then
concatenated in the time domain to yield a smooth-sounding
speech signal, with natural-sounding prosody.
One of the keys to the success of the segment selection
component is the context dependent choice of cost functions,
and the method of combining the costs from the various features.
The RSLab system makes use of a family of complex cost
functions that allows linguistic and perceptual knowledge to be
incorporated in the segment selection process.
Coorman, Geert / Fackrell, Justin / Rutten, Peter / Coile, Bert Van (2000):
"Segment selection in the L&h Realspeak laboratory TTS system",
In ICSLP-2000, vol.2, 395-398.