Corpus-based speech synthesis systems deliver a considerable synthesis quality since the unit selection approaches have been optimized in the last decade. Unit selection attempts to find the best combination of speech unit sequences in an inventory so that the perceptual differences between expected (natural) and synthesized signals are as low as possible. However, mismatches and distortions are still possible in concatenative speech synthesis and they are normally perceptible in the synthesized waveform. Therefore, unit selection strategies and parameter tuning are still important issues in the improvement of speech synthesis. We present a novel concept to increase the efficiency of the exhaustive speech unit search within the inventory via a unit selection model. This model bases its operation on a mapping analysis of the concatenation sub-costs, a Bayes optimal classification (BOC), and a Maximum likelihood selection (MLS). The principle advantage of the proposed unit selection method is that it does not require an exhaustive training to set up weighted coefficients for target and concatenation sub-costs. It provides an alternative for unit selection but requires further optimization, e. g. by integrating target cost mapping.
Bibliographic reference. Rosales, Abubeker Gamboa / Rosales, Hamurabi Gamboa / Hoffmann, Ruediger (2009): "Maximum likelihood unit selection for corpus-based speech synthesis", In INTERSPEECH-2009, 748-751.