SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced Languages
When developing synthesizers for new languages one must select a phoneset, record phonetically balanced sentences, build up a pronunciation lexicon, and evaluate the results. An objective measure of voice quality can be very useful, provided it is calibrated across multiple speakers, languages, and databases. As a substitute for full listening tests, this paper adopts mel-capstral distortion as a measure of spectral accuracy, and proposes systematic variation of a known English corpus as a method of calibration. We find that doubling the database size reduces MCD by o.12, while reverting to a grapheme-based voice increases it by 0.27. This offers a frame of reference for estimationg voice quality, which is applied to a test suite of 8 non-English languages.
Bibliographic reference. Kominek, John / Schultz, Tanja / Black, Alan W. (2008): "Synthesizer voice quality of new languages calibrated with mean mel cepstral distortion", In SLTU-2008, 63-68.