Sixth European Conference on Speech Communication and Technology
The design principles and collection procedures behind a speech synthesis corpus directly impact the performance of the resulting text-to-speech system. This paper describes the design and collection of the Victoria corpus, created to support speech synthesis research and development at Apple Computer. This corpus is composed of five constituent parts, each designed to cover a specific aspect of speech synthesis: polyphones, prosodic contexts, reiterant speech, function word sequences, and continuous speech. It was spoken in general U.S. English by one linguistically-trained adult female. Portions of the corpus are being used in the statistical estimation of duration and pitch models for Apple's next-generation text-to-speech system, MacinTalk 4.
Full Paper (PDF)
Bibliographic reference. Silverman, Kim / Anderson, Victoria / Bellegarda, Jerome / Lenzo, Kevin / Naik, Devang (1999): "Design and ccollection of a corpus of polyphones and prosodic contexts for speech synthesis research and development", In EUROSPEECH'99, 2707-2708.