Interspeech'2005 - Eurospeech
Extensive research has been devoted to robustness in the presence of various types and degrees of environmental noise over the past several years, however this remains one of the main problems facing automatic speech recognition systems. This paper describes a new variable frame rate analysis technique, based upon searching a predefined lookahead interval for the next frame position that maximizes the first-order difference of the log energy (ΔE) between the consecutive frames. The application of this novel technique to noise-robust ASR front-end processing is also reported. In comparison with existing variable frame rate methods in the literature, the proposed energy search approach is simpler and achieves similar recognition accuracy improvements at lower complexity. Experimental work on the Aurora II connected digits database reveals that the proposed front-end, together with cumulative distribution mapping, achieves average digit recognition accuracies of 78.32% for a model set trained from clean data and 89.95% for a model set trained from data with multiple noise conditions, representing 6.1% and 2.3% reductions in word error rates respectively over a cumulative distribution mapping baseline.
Bibliographic reference. Epps, Julien / Choi, Eric H. C. (2005): "An energy search approach to variable frame rate front-end processing for robust ASR", In INTERSPEECH-2005, 2613-2616.