12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Automatically Creating a Diphone Set from a Speech Database

Thomas Ewender, Beat Pfister

ETH Zürich, Switzerland

This paper presents a measure that scores various aspects of phone quality. The measure is designed to penalize phone instances with one or several characteristics that are not desirable in concatenation-based speech synthesis. Depending on the phone type, these aspects amongst others include spectrum, phase, fundamental frequency, duration, voicing and plosive quality. We applied this quality measure to select diphone sets from four different speech databases and demonstrate the quality of these diphone sets by means of synthesis examples. The quality of these examples showed that the proposed measure can be applied to select a high-quality diphone set from a speech database.

Full Paper

Bibliographic reference.  Ewender, Thomas / Pfister, Beat (2011): "Automatically creating a diphone set from a speech database", In INTERSPEECH-2011, 2169-2172.