ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Soft missing-feature mask generation for simultaneous speech recognition system in robots

Toru Takahashi, Shun'ichi Yamamoto, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart.


doi: 10.21437/Interspeech.2008-289

Cite as: Takahashi, T., Yamamoto, S., Nakadai, K., Komatani, K., Ogata, T., Okuno, H.G. (2008) Soft missing-feature mask generation for simultaneous speech recognition system in robots. Proc. Interspeech 2008, 992-995, doi: 10.21437/Interspeech.2008-289

@inproceedings{takahashi08_interspeech,
  author={Toru Takahashi and Shun'ichi Yamamoto and Kazuhiro Nakadai and Kazunori Komatani and Tetsuya Ogata and Hiroshi G. Okuno},
  title={{Soft missing-feature mask generation for simultaneous speech recognition system in robots}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={992--995},
  doi={10.21437/Interspeech.2008-289}
}