International Workshop on Hands-Free Speech Communication (HSC2001)

April 9-11, 2001
Kyoto, Japan

Multiple Sound Sources Recognition by a Microphone Array-Based 3-D N-Best Search with Likelihood Normalization

Panikos Heracleous, Satoshi Nakamura, Kiyohiro Shikano

(1) ATR Spoken Language Translation Research Labs, Kyoto, Japan
(2) Graduate School of Information Science, Nara Institute of Science and Technology, Japan

This paper deals with the hands-free speech recognition and, particularly, with the simultaneous recognition of multiple sound sources. Our method is based on the 3-D Viterbi search, i.e., extended to 3-D N-best search method enabling the recognition of multiple sound sources. The baseline system integrates two existing technologies - 3-D Viterbi search and conventional N-best search - into a complete system. However, the first evaluation of the 3-D N-best search-based system showed, that new ideas are necessary in order to build a system for simultaneous recognition of multiple sound sources. Two factors fonnd to have an important role in the performance of our system, namely the different likelihood ranges of the sound sources and the direction-based separation of the hypotheses. In order to solve these problems we implemented a likelihood normalization and a path distance-based clustering technique into the baseline 3-D N-best search-based system. The performance of our system was evaluated through experiments on simulated data for the case of two talkers. The experiments showed significant improvements by implementing the two techniques described above. The best results were obtained by implementing the two techniques and using a microphone array composed of 32 elements. More specifically, in that case the Word Accuracy for the two talkers was higher than 80 % and the Simultaneous Word Accuracy (both sources are correctly recognized simultaneously) higher than 70 %, which are very promising results.


Full Paper

Bibliographic reference.  Heracleous, Panikos / Nakamura, Satoshi / Shikano, Kiyohiro (2001): "Multiple sound sources recognition by a microphone array-based 3-D N-best search with likelihood normalization", In HSC2001, 103-106.