This paper presents a computational model of human speech production based on the hypothesis that low energy attractors for a human speech production system can be identified, and that interpolation/extrapolation along the key dimension of hypo/hyper-articulation can be motivated by energetic considerations of phonetic contrast. An HMM-based speech synthesiser along with continuous adaptation of its statistical models was used to implement the model. Two adaptation methods were proposed for vowel and consonant models and their effectiveness was tested by showing that such hypo/hyper-articulation control can manipulate successfully the intelligibility of synthetic speech in noise. Objective evaluations with the ANSI Speech Intelligibility Index indicate that intelligibility in various types of noise is effectively controlled. In particular, in the hyper-articulation transforms, the improvement with respect to unadapted speech is above 25 %.
Index Terms: reactive speech synthesis, hypo/hyper-articulated speech, intelligibility enhancement
Bibliographic reference. Nicolao, Mauro / Latorre, Javier / Moore, Roger K. (2012): "C2h: a computational model of H&h-based phonetic contrast in synthetic speech", In INTERSPEECH-2012, 987-990.