8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Separation of Multiple Concurrent Speeches using Audio-visual Speaker Localization and Minimum Variance Beam-forming

Changkyu Choi, Donggeon Kong, Hyoung-Ki Lee, Sang Min Yoon

Samsung Advanced Institute of Technology, Korea

Speaker segmentation is an important task in multi-party conversations. Overlapping speech poses a serious problem in segmenting audio into speaker turns. We propose an audio-visual speech separation system consisting of an array microphone with eight sensors and an omni-directional color camera. Multiple concurrent speeches are segmented by fusing the two heterogeneous sensors. Each segmented speech is further enhanced by a linearly constrained minimum variance beamformer. Regardless of coexisting wide-band sound sources and pictures of human in a reverberant environment the proposed system effectively separates multiple target speeches.

Full Paper

Bibliographic reference.  Choi, Changkyu / Kong, Donggeon / Lee, Hyoung-Ki / Yoon, Sang Min (2004): "Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming", In INTERSPEECH-2004, 2301-2304.