9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots

Toru Takahashi (1), Shun'ichi Yamamoto (1), Kazuhiro Nakadai (2), Kazunori Komatani (1), Tetsuya Ogata (1), Hiroshi G. Okuno (1)

(1) Kyoto University, Japan; (2) Honda Research Institute Japan Co. Ltd., Japan

This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart.

Full Paper

Bibliographic reference.  Takahashi, Toru / Yamamoto, Shun'ichi / Nakadai, Kazuhiro / Komatani, Kazunori / Ogata, Tetsuya / Okuno, Hiroshi G. (2008): "Soft missing-feature mask generation for simultaneous speech recognition system in robots", In INTERSPEECH-2008, 992-995.