Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition

Ji Ming, Danny Crookes


We describe the theory and implementation of full-sentence speech correlation for speech recognition, and demonstrate its superior robustness to unseen/untrained noise. For the Aurora 2 data, trained with only clean speech, the new method performs competitively against the state-of-the-art with multicondition training and adaptation, and achieves the lowest word error rate in very low SNR (-5 dB). Further experiments with highly nonstationary noise (pop song, broadcast news, etc.) show the surprising ability of the new method to handle unpredictable noise. The new method adds several novel developments to our previous research, including the modeling of the speaker characteristics along with other acoustic and semantic features of speech for separating speech from noise, and a novel Viterbi algorithm to implement full-sentence correlation for speech recognition.


 DOI: 10.21437/Interspeech.2019-2127

Cite as: Ming, J., Crookes, D. (2019) Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition. Proc. Interspeech 2019, 436-440, DOI: 10.21437/Interspeech.2019-2127.


@inproceedings{Ming2019,
  author={Ji Ming and Danny Crookes},
  title={{Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={436--440},
  doi={10.21437/Interspeech.2019-2127},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2127}
}