ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Speaker separation using visual speech features and single-channel audio

Faheem Khan, Ben Milner

This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speakerĀfs speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.


doi: 10.21437/Interspeech.2013-723

Cite as: Khan, F., Milner, B. (2013) Speaker separation using visual speech features and single-channel audio. Proc. Interspeech 2013, 3264-3268, doi: 10.21437/Interspeech.2013-723

@inproceedings{khan13_interspeech,
  author={Faheem Khan and Ben Milner},
  title={{Speaker separation using visual speech features and single-channel audio}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3264--3268},
  doi={10.21437/Interspeech.2013-723}
}