Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Corpus-Based Techniques in the AT&T Nextgen Synthesis System

Ann K. Syrdal, Colin W. Wightman (1), Alistair Conkie, Yannis Stylianou, Mark Beutnagel, Juergen Schroeter, Volker Strom, Ki-Seung Lee, Matthew J. Makashay (2)

AT&T Labs - Research, Florham Park, NJ, USA
(1) also Dept. of Computer and Information Sciences, Minnesota State University, Mankato, MN, USA
(2) also Dept. of Linguistics, Ohio State University, Columbus, OH, USA
(2) also Dept. of Linguistics, Ohio State University, Columbus, OH, USA

The AT&T text-to-speech (TTS) synthesis system has been used as a framework for experimenting with a perceptually-guided data-driven approach to speech synthesis, with primary focus on data-driven elements in the "back end". Statistical training techniques applied to a large corpus are used to make decisions about predicted speech events and selected speech inventory units. Our recent advances in automatic phonetic and prosodic labeling and a new faster harmonic plus noise model (HNM) and unit preselection implementations have significantly improved TTS quality and speeded up both development time and runtime.


Full Paper

Bibliographic reference.  Syrdal, Ann K. / Wightman, Colin W. / Conkie, Alistair / Stylianou, Yannis / Beutnagel, Mark / Schroeter, Juergen / Strom, Volker / Lee, Ki-Seung / Makashay, Matthew J. (2000): "Corpus-based techniques in the AT&t nextgen synthesis system", In ICSLP-2000, vol.3, 410-415.