Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Bidisha Sharma, S.R. Mahadeva Prasanna


Text-to-speech (TTS) synthesis systems have grown popularity due to their diverse practical usability. While most of the technologies developed aims to meet requirements in laboratory environment, the practical appliance is not limited to a specific environment. This work aims towards improving intelligibility of synthesized speech to make it deployable in realism. Based on the comparison of Lombard speech and speech produced in quiet, strength of excitation is found to play a crucial role in making speech intelligible in noisy situation. A novel method for enhancement of strength of excitation is proposed which makes the synthesized speech more intelligible in practical scenario. Linear-prediction analysis based formant enhancement method is also employed to further improve the intelligibility. The proposed enhancement framework is applied in synthesized speech and evaluated in presence of different types and levels of noise. Subjective evaluation results show that, the proposed method makes the synthesized speech applicable in practical noisy environment.


DOI: 10.21437/Interspeech.2016-1005

Cite as

Sharma, B., Prasanna, S.M. (2016) Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence. Proc. Interspeech 2016, 131-135.

Bibtex
@inproceedings{Sharma+2016,
author={Bidisha Sharma and S.R. Mahadeva Prasanna},
title={Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1005},
url={http://dx.doi.org/10.21437/Interspeech.2016-1005},
pages={131--135}
}