INTERSPEECH 2004 - ICSLP
For segmenting a speech database, using a family of acoustic models provides multiple estimates of each boundary point. This is more robust than a single estimate because by taking consensus values, large labeling errors are less prevalent in the synthesis catalog, which improves the resulting voice. This paper describes HMM-based segmentation in which up to 500 related models are applied to each wavefile. In a listening test of twelve utterances, human judges preferred the proposed technique over the baseline by a tally of 6 to 2, with 4 ties.
Bibliographic reference. Kominek, John / Black, Alan W (2004): "A family-of-models approach to HMM-based segmentation for unit selection speech synthesis", In INTERSPEECH-2004, 1385-1388.