SAPA-SCALE Conference 2012

Portland, OR, USA
September 7-8, 2012

Evaluating Speech Intelligibility Enhancement for HMM-based Synthetic Speech in Noise

Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King

The Centre for Speech Technology Research, University of Edinburgh, UK

It is possible to increase the intelligibility of speech in noise by enhancing the clean speech signal. In this paper we demonstrate the effects of modifying the spectral envelope of synthetic speech according to the environmental noise. To achieve this, we modify Mel cepstral coefficients according to an intelligibility measure that accounts for glimpses of speech in noise: the Glimpse Proportion measure. We evaluate this method against a baseline synthetic voice trained only with normal speech and a topline voice trained with Lombard speech, as well as natural speech. The intelligibility of these voices was measured when mixed with speech-shaped noise and with a competing speaker at three different levels. The Lombard voices, both natural and synthetic, were more intelligible than the normal voices in all conditions. For speechshaped noise, the proposed modified voice was as intelligible as the Lombard synthetic voice without requiring any recordings of Lombard speech, which are hard to obtain. However, in the case of competing talker noise, the Lombard synthetic voice was more intelligible than the proposed modified voice.

Index Terms: HMM-based speech synthesis, intelligibility of speech in noise, Lombard speech

Full Paper

Bibliographic reference.  Valentini-Botinhao, Cassia / Yamagishi, Junichi / King, Simon (2012): "Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise", In SAPA-SCALE-2012, 22-27.