8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

A Family-of-Models Approach to HMM-based Segmentation for Unit Selection Speech Synthesis

John Kominek, Alan W Black

Carnegie Mellon University, USA

For segmenting a speech database, using a family of acoustic models provides multiple estimates of each boundary point. This is more robust than a single estimate because by taking consensus values, large labeling errors are less prevalent in the synthesis catalog, which improves the resulting voice. This paper describes HMM-based segmentation in which up to 500 related models are applied to each wavefile. In a listening test of twelve utterances, human judges preferred the proposed technique over the baseline by a tally of 6 to 2, with 4 ties.

Full Paper

Bibliographic reference.  Kominek, John / Black, Alan W (2004): "A family-of-models approach to HMM-based segmentation for unit selection speech synthesis", In INTERSPEECH-2004, 1385-1388.