In speech recognition for real-world applications, the performance degradation due to the mismatch introduced between training and testing environments should be overcome. In this paper, to reduce this mismatch, we provide a hybrid method of spectral subtraction and residual noise masking. We also employ multiple model approach to obtain improved robustness over various noise environments. In this approach, multiple model sets are made according to several noise masking levels and then a model set appropriate for the estimated noise level is selected automatically in recognition phase. According to speaker independent isolated word recognition experiments in car noise environments, the proposed method using model sets with only two masking levels reduces average word error rate by 60% in comparison with spectral subtraction method.
Cite as: Song, M.G., Jung, H.I., Shim, K.-J., Kim, H.S. (1998) Speech recognition in car noise environments using multiple models according to noise masking levels. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 1065, doi: 10.21437/ICSLP.1998-332
@inproceedings{song98_icslp, author={Myung Gyu Song and Hoi In Jung and Kab-Jong Shim and Hyung Soon Kim}, title={{Speech recognition in car noise environments using multiple models according to noise masking levels}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 1065}, doi={10.21437/ICSLP.1998-332} }