In this paper, a within-class feature normalization (WCFN) framework operating in transformed segment-level (instead of frame-level) super-vector space is proposed for robust speech recognition. In this framework, each segment hypothesis in a lattice is represented by a high dimensional super-vector and projected to a class-dependent lower-dimensional eigen-subspace to remove unwanted variability due to environment noise and speaker (different values of SNR, gender, types of noise and so on). The normalized super-vectors are verified by a bank of class detectors to further rescore the lattice. Experimental results on Aurora 2 multi-condition training task showed that the proposed WCFN approach achieved 7.45% average word error rate (WER). WCFN not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%), the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) approaches.
Bibliographic reference. Liao, Yuan-Fu / Hsu, Chi-Hui / Yang, Chi-Min / Lin, Jeng-Shien / Chang, Sen-Chia (2008): "Within-class feature normalization for robust speech recognition", In INTERSPEECH-2008, 1020-1023.