Improvement to a NAM captured whisper-to-speech system

Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Christian Jutten

In this paper, new techniques to improve whisper-to-speech conversion are investigated, in the framework of silent speech telephone communication. A preliminary conversion method from Non-Audible Murmur (NAM) to modal speech, based on statistical mapping trained using aligned corpora has been proposed. Although it is a very promising technique, its performance is still insufficient due to the difficulties in estimating F0 from unvoiced speech. In this paper, two distinct modifications are proposed, in order to improve the naturalness of the synthesized speech. In the first modification, LDA (Linear Discriminant Analysis) is used instead of PCA (Principal Component Analysis) to reduce the dimensionality of the input spectral vectors. In addition, the influence of long-term variation of spectral information on pitch estimation is examined. The second modification is an attempt to integrate visual information as a complementary input to improve spectral estimation, F0 estimation and voicing decision.

doi: 10.21437/Interspeech.2008-422

Cite as: Tran, V.-A., Bailly, G., Loevenbruck, H., Jutten, C. (2008) Improvement to a NAM captured whisper-to-speech system. Proc. Interspeech 2008, 1465-1468, doi: 10.21437/Interspeech.2008-422

