5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

A Signal Processing System for Having the Sound "Pop-Out" in Noise Thanks to the Image of the Speaker's Lips: New Advances Using Multi-Layer Perceptrons

Laurent Girin, Laurent Varin, Gang Feng, Jean-Luc Schwartz

Institut de la Communication Parlee de Grenoble, France

This paper deals with the improvement of a noisy speech enhancement system based on the fusion of auditory and visual information. The system was presented in previous papers and implemented in the context of vowel to vowel and vowel to consonant transitions corrupted with white noise. Its principle consists in an analysis-enhancement-synthesis process based on a linear prediction (LP) model of the signal: the LP filter is enhanced thanks to associative tools that estimate LP cleaned parameters from both noisy audio and visual information. The detailed structure of the system is reminded and we focus on the improvement that concerns precisely the associators: basic neural networks (multi-layers perceptrons) are used instead of linear regression. It is shown that in the context of VCV transitions corrupted with white noise, neural networks can improve the performances of the system in terms of intelligibility gain, distance measures and classification tests.

Full Paper

Bibliographic reference.  Girin, Laurent / Varin, Laurent / Feng, Gang / Schwartz, Jean-Luc (1998): "A signal processing system for having the sound "pop-out" in noise thanks to the image of the speaker's lips: new advances using multi-layer perceptrons", In ICSLP-1998, paper 0431.