Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Simultaneous Recognition of Multiple Sound Sources Based on 3-D N-Best Search Using Microphone Array

Panikos Heracleous, Takeshi Yamada, Satoshi Nakamura, Kiyohiro Shikano

Graduate School of Information Science, Nara Institute of Science and Technology, Japan

The recognition of distant talking speech in a noisy and reverberant environments is key issue in any speech recognition system. A so-called hands-free speech recognition system plays an important role in the natural and friendly human-machine interface. Considering the practical use of a speech recognition system, we realize that such a system has to deal, also, with the case of the presence of multiple sound sources, including multiple talkers, as well as other noise sources. This paper proposes a novel method which recognizes multiple talkers simultaneously in real environments by extending the 3-D Viterbi search to a 3-D N-best search algorithm. While the 3-D Viterbi method finds the most likely path in the 3-D trellis space, the proposed method considers multiple hypotheses for each direction in every frame. Combinations of the direction sequence and the phoneme sequence of multiple sources are included in the N-best list. The paper investigates the performance of the proposed method through experiments using real utterances of multiple talkers.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Heracleous, Panikos / Yamada, Takeshi / Nakamura, Satoshi / Shikano, Kiyohiro (1999): "Simultaneous recognition of multiple sound sources based on 3-d n-best search using microphone array", In EUROSPEECH'99, 69-72.