Interspeech'2005 - Eurospeech
Users require speech recognition systems that offer rapid response and robustness (high accuracy). Speech recognition accuracy suffers from additive noise, imposed by ambient noise, and convolutional noise, created by space transfer characteristics. Existing model adaptation techniques achieve robustness by using HMM-composition and CMN (cepstral mean normalization). Since they need the additive noise sample as well as the user speech sample to generate the models required, they can not achieve rapid response. The proposed technique generates noise adapted models in a preliminary step, and then normalizes the models' parameters using just the additive noise observed by the system. In our technique, after the user's speech sample is captured, only CMN need be performed to start recognition processing, so its response is rapid. Another innovation is the creation of several HMMs to cover the wide S/N range expected in real applications; it raises accuracy and response speed. Simulations conducted using artificial speech samples generated to represent 7 S/N values show that an extended version of the proposed technique holds the reduction in average recognition error to 17.6% compared to the basic HMM composition method.
Bibliographic reference. Kobashikawa, Satoshi / Takahashi, Satoshi / Yamaguchi, Yoshikazu / Ogawa, Atsunori (2005): "Rapid response and robust speech recognition by preliminary model adaptation for additive and convolutional noise", In INTERSPEECH-2005, 965-968.