9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Unsupervised Adaptation for HMM-Based Speech Synthesis

Simon King (1), Keiichi Tokuda (2), Heiga Zen (2), Junichi Yamagishi (1)

(1) University of Edinburgh, UK; (2) Nagoya Institute of Technology, Japan

It is now possible to synthesise speech using HMMs with a comparable quality to unit-selection techniques. Generating speech from a model has many potential advantages over concatenating waveforms. The most exciting is model adaptation. It has been shown that supervised speaker adaptation can yield high-quality synthetic voices with an order of magnitude less data than required to train a speaker-dependent model or to build a basic unit-selection system. Such supervised methods require labelled adaptation data for the target speaker. In this paper, we introduce a method capable of unsupervised adaptation, using only speech from the target speaker without any labelling.

Full Paper

Bibliographic reference.  King, Simon / Tokuda, Keiichi / Zen, Heiga / Yamagishi, Junichi (2008): "Unsupervised adaptation for HMM-based speech synthesis", In INTERSPEECH-2008, 1869-1872.