This paper presents a measure that scores various aspects of phone quality. The measure is designed to penalize phone instances with one or several characteristics that are not desirable in concatenation-based speech synthesis. Depending on the phone type, these aspects amongst others include spectrum, phase, fundamental frequency, duration, voicing and plosive quality. We applied this quality measure to select diphone sets from four different speech databases and demonstrate the quality of these diphone sets by means of synthesis examples. The quality of these examples showed that the proposed measure can be applied to select a high-quality diphone set from a speech database.
Bibliographic reference. Ewender, Thomas / Pfister, Beat (2011): "Automatically creating a diphone set from a speech database", In INTERSPEECH-2011, 2169-2172.