8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Evaluating the Temporal Structure Normalisation Technique on the Aurora-4 Task

Xiong Xiao (1), Eng Siong Chng (1), Haizhou Li (2)

(1) Nanyang Technological University, Singapore
(2) Institute for Infocomm Research, Singapore

We evaluate the temporal structure normalisation (TSN), a feature normalisation technique for robust speech recognition, on the large vocabulary Aurora-4 task. The TSN technique operates by normalising the trend of the feature's power spectral density (PSD) function to a reference function using finite impulse response (FIR) filters. The features are the cepstral coefficients and the normalisation procedure is performed on every cepstral channel of each utterance. Experimental results show that the TSN reduces the average word error rate (WER) by 7.20% and 8.16% relatively over the mean-variance normalisation (MVN) and the histogram equalisation (HEQ) baselines respectively. We further evaluate two other state-of-the-art temporal filters. Experimental results show that among the three evaluated temporal filters, the TSN filter performs the best. Lastly, our results also demonstrates that fixed smoothing filters are less effective on Aurora-4 task than on Aurora-2 task.

Full Paper

Bibliographic reference.  Xiao, Xiong / Chng, Eng Siong / Li, Haizhou (2007): "Evaluating the temporal structure normalisation technique on the Aurora-4 task", In INTERSPEECH-2007, 1070-1073.