Sixth European Conference on Speech Communication and Technology
The recognition of distant talking speech in a noisy and reverberant environments is key issue in any speech recognition system. A so-called hands-free speech recognition system plays an important role in the natural and friendly human-machine interface. Considering the practical use of a speech recognition system, we realize that such a system has to deal, also, with the case of the presence of multiple sound sources, including multiple talkers, as well as other noise sources. This paper proposes a novel method which recognizes multiple talkers simultaneously in real environments by extending the 3-D Viterbi search to a 3-D N-best search algorithm. While the 3-D Viterbi method finds the most likely path in the 3-D trellis space, the proposed method considers multiple hypotheses for each direction in every frame. Combinations of the direction sequence and the phoneme sequence of multiple sources are included in the N-best list. The paper investigates the performance of the proposed method through experiments using real utterances of multiple talkers.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Heracleous, Panikos / Yamada, Takeshi / Nakamura, Satoshi / Shikano, Kiyohiro (1999): "Simultaneous recognition of multiple sound sources based on 3-d n-best search using microphone array", In EUROSPEECH'99, 69-72.