One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features

Masakiyo Fujimoto, Hisashi Kawai


This paper introduces a method of noise-robust automatic speech recognition (ASR) that remains effective under one-pass single-channel processing. Under these constraints, the use of single-channel speech enhancement seems to be a reasonable noise-robust approach to ASR, because complicated techniques requiring multi-pass processing cannot be used. However, in many cases, single-channel speech enhancement seriously deteriorates the accuracy of ASR because of speech distortion. In addition, the advanced acoustic modeling framework (joint training) is relatively ineffective in the case of single-channel processing. To overcome these problems, we propose a noise-robust acoustic modeling framework based on a feature-level combination of noisy speech and enhanced speech. To obtain further improvements, we also adopt a sub-network-level combination of noisy and enhanced speech, and a gating mechanism that can dynamically select appropriate speech features. Through comparative evaluations, we confirm that the proposed method successfully improves the accuracy of ASR in noisy environments under strong constraints.


 DOI: 10.21437/Interspeech.2019-1270

Cite as: Fujimoto, M., Kawai, H. (2019) One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features. Proc. Interspeech 2019, 486-490, DOI: 10.21437/Interspeech.2019-1270.


@inproceedings{Fujimoto2019,
  author={Masakiyo Fujimoto and Hisashi Kawai},
  title={{One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={486--490},
  doi={10.21437/Interspeech.2019-1270},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1270}
}