Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
Motivated by the fact that words are not equally confusable, we explore the idea of using word-level intelligibility predictions to selectively boost the harder-tounderstand words in a sentence, aiming to improve overall intelligibility in the presence of noise. First, the intelligibility of a set of words from dense and sparse phonetic neighbourhoods was evaluated in isolation. The resulting intelligibility scores were used to inform two sentence-level experiments. In the first experiment the signal-to-noise ratio of one word was boosted to the detriment of another word. Sentence intelligibility did not generally improve. The intelligibility of words in isolation and in a sentence were found to be significantly different, both in clean and in noisy conditions. For the second experiment, one word was selectively boosted while slightly attenuating all other words in the sentence. This strategy was successful for words that were poorly recognised in that particular context. However, a reliable predictor of word-in-context intelligibility remains elusive, since this involves - as our results indicate - semantic, syntactic and acoustic information about the word and the sentence. Index Terms: word confusability, neighbourhood density, HMM-based speech synthesis
Bibliographic reference. Valentini-Botinhao, Cassia / Wester, Mirjam / Yamagishi, Junichi / King, Simon (2013): "Using neighbourhood density and selective SNR boosting to increase the intelligibility of synthetic speech in noise", In SSW8, 113-118.