While current research in speech synthesis focuses on the generation of various speaking styles or emotions, very few studies have addressed the possibility of including phonetic variations according to the communicative situation of the target speech (sports commentaries, TV news, etc.). However, significant phonetic variations have been observed, depending on various communicative factors (e.g. spontaneous/read and media broadcast or not). This study analyzes whether these alternative pronunciations contribute to the plausibility of the message and should therefore be considered in synthesis. To this end, subjective tests are performed on synthesized French sports commentaries. They aim at comparing HMM-based speech synthesis with genuine pronunciation and with neutral NLP-produced phonetization. Results show that the integration of the phonetic variations significantly improves the perceived naturalness of the generated speech. They also highlight the relative importance of the various types of variations and show that schwa elisions, in particular, play a crucial role in that respect.
Bibliographic reference. Brognaux, Sandrine / Picart, Benjamin / Drugman, Thomas (2014): "Speech synthesis in various communicative situations: impact of pronunciation variations", In INTERSPEECH-2014, 1524-1528.