Speech recognition systems are prone to severe degradation in noisy environments due to mismatch between training and testing conditions. A multi-stream approach for keyword spotting is proposed to improve robustness in mismatched conditions. The assumption is that most real world noises are colored and do not affect the full spectrum equally, meaning certain parts of the spectrum can still provide reliable information characterizing the utterance. In the proposed method for keyword spotting, the full frequency band is split into several sub-bands, each of which contain both static and delta parameters. Robustness is achieved using only features from sub-bands with highest signal-tonoise ratio (SNR) during recognition, while ignoring sub-bands that are strongly affected by noise. The problem is how to correctly select and combine the useful bands for accurate recognition, without prior knowledge of the noise characteristics. In this paper we propose a new likelihood ratio, used both to select usable bands and provide a confidence measure for robust keyword spotting. Tests carried out using the TiDigits database show a significant improvement in keyword spotting performance compared to a product based approach. In addition, including a non-keyword test set from Resource Management results in a reduction of Equal Error Rate.
Cite as: Conn, C., Ming, J., Hanna, P. (2004) Robust keyword spotting using a multi-stream approach. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 260-267
@inproceedings{conn04_specom, author={Cheryl Conn and Ji Ming and Philip Hanna}, title={{Robust keyword spotting using a multi-stream approach}}, year=2004, booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)}, pages={260--267} }