INTERSPEECH 2004 - ICSLP
Speaker segmentation is an important task in multi-party conversations. Overlapping speech poses a serious problem in segmenting audio into speaker turns. We propose an audio-visual speech separation system consisting of an array microphone with eight sensors and an omni-directional color camera. Multiple concurrent speeches are segmented by fusing the two heterogeneous sensors. Each segmented speech is further enhanced by a linearly constrained minimum variance beamformer. Regardless of coexisting wide-band sound sources and pictures of human in a reverberant environment the proposed system effectively separates multiple target speeches.
Bibliographic reference. Choi, Changkyu / Kong, Donggeon / Lee, Hyoung-Ki / Yoon, Sang Min (2004): "Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming", In INTERSPEECH-2004, 2301-2304.