16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

HMM Training Strategy for Incremental Speech Synthesis

Maël Pouget (1), Thomas Hueber (1), Gérard Bailly (1), Timo Baumann (2)

(1) GIPSA, France
(2) Universität Hamburg, Germany

Incremental speech synthesis aims at delivering the synthetic voice while the sentence is still being typed. One of the main challenges is the online estimation of the target prosody from a partial knowledge of the sentence's syntactic structure. In the context of HMM-based speech synthesis, this typically results in missing segmental and suprasegmental features, which describe the linguistic context of each phoneme. This study describes a voice training procedure which integrates explicitly a potential uncertainty on some contextual features. The proposed technique is compared to a baseline approach (previously published), which consists in substituting a missing contextual feature by a default value calculated on the training set. Both techniques were implemented in a HMM-based Text-To-Speech system for French, and compared using objective and perceptual measurements. Experimental results show that the proposed strategy outperforms the baseline technique for this language.

Full Paper     Acoustic Examples

Bibliographic reference.  Pouget, Maël / Hueber, Thomas / Bailly, Gérard / Baumann, Timo (2015): "HMM training strategy for incremental speech synthesis", In INTERSPEECH-2015, 1201-1205.