INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Improvement to a NAM Captured Whisper-to-Speech System

Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Christian Jutten

GIPSA, France

In this paper, new techniques to improve whisper-to-speech conversion are investigated, in the framework of silent speech telephone communication. A preliminary conversion method from Non-Audible Murmur (NAM) to modal speech, based on statistical mapping trained using aligned corpora has been proposed. Although it is a very promising technique, its performance is still insufficient due to the difficulties in estimating F0 from unvoiced speech. In this paper, two distinct modifications are proposed, in order to improve the naturalness of the synthesized speech. In the first modification, LDA (Linear Discriminant Analysis) is used instead of PCA (Principal Component Analysis) to reduce the dimensionality of the input spectral vectors. In addition, the influence of long-term variation of spectral information on pitch estimation is examined. The second modification is an attempt to integrate visual information as a complementary input to improve spectral estimation, F0 estimation and voicing decision.

Full Paper

Bibliographic reference.  Tran, Viet-Anh / Bailly, Gérard / Loevenbruck, Hélène / Jutten, Christian (2008): "Improvement to a NAM captured whisper-to-speech system", In INTERSPEECH-2008, 1465-1468.