Sixth International Conference on Spoken Language Processing
The AT&T text-to-speech (TTS) synthesis system has been used as a framework for experimenting with a perceptually-guided data-driven approach to speech synthesis, with primary focus on data-driven elements in the "back end". Statistical training techniques applied to a large corpus are used to make decisions about predicted speech events and selected speech inventory units. Our recent advances in automatic phonetic and prosodic labeling and a new faster harmonic plus noise model (HNM) and unit preselection implementations have significantly improved TTS quality and speeded up both development time and runtime.
Bibliographic reference. Syrdal, Ann K. / Wightman, Colin W. / Conkie, Alistair / Stylianou, Yannis / Beutnagel, Mark / Schroeter, Juergen / Strom, Volker / Lee, Ki-Seung / Makashay, Matthew J. (2000): "Corpus-based techniques in the AT&t nextgen synthesis system", In ICSLP-2000, vol.3, 410-415.