The design principles and collection procedures behind a speech synthesis corpus directly impact the performance of the resulting text-to-speech system. This paper describes the design and collection of the Victoria corpus, created to support speech synthesis research and development at Apple Computer. This corpus is composed of five constituent parts, each designed to cover a specific aspect of speech synthesis: polyphones, prosodic contexts, reiterant speech, function word sequences, and continuous speech. It was spoken in general U.S. English by one linguistically-trained adult female. Portions of the corpus are being used in the statistical estimation of duration and pitch models for Apple's next-generation text-to-speech system, MacinTalk 4.
Cite as: Silverman, K., Anderson, V., Bellegarda, J., Lenzo, K., Naik, D. (1999) Design and ccollection of a corpus of polyphones and prosodic contexts for speech synthesis research and development. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2707-2708, doi: 10.21437/Eurospeech.1999-497
@inproceedings{silverman99_eurospeech, author={Kim Silverman and Victoria Anderson and Jerome Bellegarda and Kevin Lenzo and Devang Naik}, title={{Design and ccollection of a corpus of polyphones and prosodic contexts for speech synthesis research and development}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2707--2708}, doi={10.21437/Eurospeech.1999-497} }