We evaluate the temporal structure normalisation (TSN), a feature normalisation technique for robust speech recognition, on the large vocabulary Aurora-4 task. The TSN technique operates by normalising the trend of the feature's power spectral density (PSD) function to a reference function using finite impulse response (FIR) filters. The features are the cepstral coefficients and the normalisation procedure is performed on every cepstral channel of each utterance. Experimental results show that the TSN reduces the average word error rate (WER) by 7.20% and 8.16% relatively over the mean-variance normalisation (MVN) and the histogram equalisation (HEQ) baselines respectively. We further evaluate two other state-of-the-art temporal filters. Experimental results show that among the three evaluated temporal filters, the TSN filter performs the best. Lastly, our results also demonstrates that fixed smoothing filters are less effective on Aurora-4 task than on Aurora-2 task.
Bibliographic reference. Xiao, Xiong / Chng, Eng Siong / Li, Haizhou (2007): "Evaluating the temporal structure normalisation technique on the Aurora-4 task", In INTERSPEECH-2007, 1070-1073.