This paper presents a new approach to acoustic sniffing for irrelevant variability normalization (IVN) based acoustic model training and speech recognition. Given a training corpus, a socalled i-vector is extracted from each training speech segment. A clustering algorithm is used to cluster the training i-vectors into multiple clusters, each corresponding to an acoustic condition. The acoustic sniffing can then be implemented as finding the most similar cluster by comparing the i-vector extracted from a speech segment with the centroid of each cluster. Experimental results on Switchboard-1 conversational telephone speech transcription task suggest that the i-vector based acoustic sniffing outperforms our previous Gaussian mixture model (GMM) based approach. The proposed approach is very efficient therefore can deal with very large scale training corpus on current mainstream computing platforms, yet has very low run-time cost.
Bibliographic reference. Xu, Jian / Zhang, Yu / Yan, Zhi-Jie / Huo, Qiang (2011): "An i-vector based approach to acoustic sniffing for irrelevant variability normalization based acoustic model training and speech recognition", In INTERSPEECH-2011, 1701-1704.