ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Recognizing speech from simultaneous speakers

Bhiksha Raj, Rita Singh, Paris Smaragdis

In this paper we present and evaluate factored methods for recognition of simultaneous speech from multiple speakers in single-channel recordings. Factored methods decompose the problem of jointly recognizing the speech from each of the speakers by separately recognizing the speech from each speaker. In order to achieve this, the signal components of the target speaker in each case must be enhanced in some manner. We do this in two ways: using an NMF-based speaker separation algorithm that generates separated spectra for each speaker, and a mask estimation method that generates spectral masks for each speaker that must be used in conjunction with a missing-feature method that can recognize speech from partial spectral data. Experiments on synthetic mixtures of signals from the Wall Street Journal corpus show that both approaches can greatly improve the recognition of the individual signals in the mixture.

doi: 10.21437/Interspeech.2005-852

Cite as: Raj, B., Singh, R., Smaragdis, P. (2005) Recognizing speech from simultaneous speakers. Proc. Interspeech 2005, 3317-3320, doi: 10.21437/Interspeech.2005-852

  author={Bhiksha Raj and Rita Singh and Paris Smaragdis},
  title={{Recognizing speech from simultaneous speakers}},
  booktitle={Proc. Interspeech 2005},