13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Sub-band based Log-energy and Its Dynamic Range Stretching for Robust In-car Speech Recognition

Weifeng Li (1), Hervé Bourlard (2)

(1) Department of Electronic Engineering / Graduate School at Shenzhen, Tsinghua University, China
(2) Idiap Research Institute, Martigny, Switzerland

Log energy and its delta parameters, typically derived from full-band spectrum, are commonly used in automatic speech recognition (ASR) systems. In this paper, we address the problem of estimating log energy in the presence of background noise (usually resulting in a reduction in dynamic ranges of spectral energies). We theoretically show that the background noise affects the trajectories of the "conventional" log energy and its delta parameters, resulting in very poor estimation of the actual log energy and its delta parameters, which no longer describe the speech signal. We thus propose to estimate log energy from the sub-band spectrum, followed by a dynamic range stretching. Based on speech recognition experiments conducted on CENSREC-2 in-car database, the proposed log energy (and its corresponding delta parameters) is shown to perform very well, resulting in an average relative improvement of 27.2% compared with the baseline front-ends. Moreover, it is also shown that further improvement can be achieved by incorporating those new MFCCs obtained through non-linear spectral contrast stretching.

Full Paper

Bibliographic reference.  Li, Weifeng / Bourlard, Hervé (2012): "Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition", In INTERSPEECH-2012, 314-317.