14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Speaker Separation Using Visual Speech Features and Single-Channel Audio

Faheem Khan, Ben Milner

University of East Anglia, UK

This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speakerfs speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.

Full Paper

Bibliographic reference.  Khan, Faheem / Milner, Ben (2013): "Speaker separation using visual speech features and single-channel audio", In INTERSPEECH-2013, 3264-3268.