EUROSPEECH 2003 - INTERSPEECH 2003
In this paper a method based on the excitation source information is proposed for enhancement of speech, degraded by speech from other speakers. Speech from multiple speakers is simultaneously collected over two spatially distributed microphones. Time-delay of each speaker with respect to the two microphones is estimated using the excitation source information. A weight function is derived for each speaker using the knowledge of the time-delay and the excitation source information. Linear prediction (LP) residuals of the microphone signals are processed separately using the weight functions. Speech signals are synthesized from the modified residuals. One speech signal per speaker is derived from each microphone signal. The synthesized speech signals of each speaker are combined to produce enhanced speech. Significant enhancement of the speech of one speaker relative to other was observed from the combined signal.
Bibliographic reference. Yegnanarayana, B. / Prasanna, S.R. Mahadeva / Doss, Mathew Magimai (2003): "Enhancement of speech in multispeaker environment", In EUROSPEECH-2003, 581-584.