7th International Conference on Spoken Language Processing
September 16-20, 2002
Like many current TTS systems the AT&T German text-to-speech system is based on the methods of unit selection and concatenative synthesis . This paper highlights efforts to improve TTS quality by closely matching the speakersí original productions with linguistic descriptions. On the segmental level this is achieved by adjusting the speakersí individual productions to an established, general norm via strict monitoring and correspondingly by having the linguistic representations that control automatic alignment and TTS output, i.e. the recognition dictionary and letter-to-sound rules, reflect those original productions. The chosen standard represents a realistic form of spoken German, avoiding overly formal pronunciations. A perceptual comparison with a more traditional interpretation of German pronunciation demonstrates the positive effect of these measures on overall synthesis quality.
Bibliographic reference. Jilka, Matthias / Syrdal, Ann K. (2002): "The AT&t German text-to-speech system: realistic linguistic description", In ICSLP-2002, 113-116.