SAPA-SCALE Conference 2012
Portland, OR, USA
It is possible to increase the intelligibility of speech in noise by enhancing the clean speech signal. In this paper we demonstrate the effects of modifying the spectral envelope of synthetic speech according to the environmental noise. To achieve this, we modify Mel cepstral coefficients according to an intelligibility measure that accounts for glimpses of speech in noise: the Glimpse Proportion measure. We evaluate this method against a baseline synthetic voice trained only with normal speech and a topline voice trained with Lombard speech, as well as natural speech. The intelligibility of these voices was measured when mixed with speech-shaped noise and with a competing speaker at three different levels. The Lombard voices, both natural and synthetic, were more intelligible than the normal voices in all conditions. For speechshaped noise, the proposed modified voice was as intelligible as the Lombard synthetic voice without requiring any recordings of Lombard speech, which are hard to obtain. However, in the case of competing talker noise, the Lombard synthetic voice was more intelligible than the proposed modified voice.
Index Terms: HMM-based speech synthesis, intelligibility of speech in noise, Lombard speech
Bibliographic reference. Valentini-Botinhao, Cassia / Yamagishi, Junichi / King, Simon (2012): "Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise", In SAPA-SCALE-2012, 22-27.