The AT&T text-to-speech (TTS) synthesis system has been used as a framework for experimenting with a perceptually-guided data-driven approach to speech synthesis, with primary focus on data-driven elements in the "back end". Statistical training techniques applied to a large corpus are used to make decisions about predicted speech events and selected speech inventory units. Our recent advances in automatic phonetic and prosodic labeling and a new faster harmonic plus noise model (HNM) and unit preselection implementations have significantly improved TTS quality and speeded up both development time and runtime.
Cite as: Syrdal, A.K., Wightman, C.W., Conkie, A., Stylianou, Y., Beutnagel, M., Schroeter, J., Strom, V., Lee, K.-S., Makashay, M.J. (2000) Corpus-based techniques in the AT&t nextgen synthesis system. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 410-415
@inproceedings{syrdal00b_icslp, author={Ann K. Syrdal and Colin W. Wightman and Alistair Conkie and Yannis Stylianou and Mark Beutnagel and Juergen Schroeter and Volker Strom and Ki-Seung Lee and Matthew J. Makashay}, title={{Corpus-based techniques in the AT&t nextgen synthesis system}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 3, 410-415} }