Sixth ISCA Workshop on Speech Synthesis
The present paper reports on the creation of German unit selection voices from corpora which had been recorded and annotated previously in the BITS project. We describe the unit selection mechanism of our MARY TTS platform, as well as the tools for creating a synthesis voice from a speech corpus, and their application to the creation of German unit selection voices from the BITS corpora. Because of reservations concerning the mismatch of phonetic chains predicted by the German TTS components in MARY and the manually corrected database labels, we compared voices based on the manually corrected labels with voices based on automatic forced alignment labelling. We compute the diphone coverage for both types of voices and show that it is a reasonable approximation of the German diphone set. A preliminary evaluation confirms the expectations: while the manually corrected versions show a higher segmental accuracy, the automatically labelled versions sound more fluent.
A1 M1 A2 M2 A3 M3 A4 M4
The audio files contain one sentence synthesised with the two versions (A vs. M) of each of the four voices that were built (1-4): "Beide Großgruppen der Gesellschaft hätten dann die Möglichkeit, den Konflikt an der Wahlurne auszutragen."
Bibliographic reference. Schröder, Marc / Hunecke, Anna (2007): "Creating German unit selection voices for the MARY TTS platform from the BITS corpora", In SSW6-2007, 95-100.