It is now possible to synthesise speech using HMMs with a comparable quality to unit-selection techniques. Generating speech from a model has many potential advantages over concatenating waveforms. The most exciting is model adaptation. It has been shown that supervised speaker adaptation can yield high-quality synthetic voices with an order of magnitude less data than required to train a speaker-dependent model or to build a basic unit-selection system. Such supervised methods require labelled adaptation data for the target speaker. In this paper, we introduce a method capable of unsupervised adaptation, using only speech from the target speaker without any labelling.
Bibliographic reference. King, Simon / Tokuda, Keiichi / Zen, Heiga / Yamagishi, Junichi (2008): "Unsupervised adaptation for HMM-based speech synthesis", In INTERSPEECH-2008, 1869-1872.