Towards Robust Speech Recognition: Structured Modeling, Irrelevant Variability Normalization and Unsupervised Online Adaptation Dr. Qiang Huo Microsoft Research Asia In the past several years, we've been studying several approaches to robust automatic speech recognition (ASR) based on three key concepts, namely structured modeling, irrelevant variability normalization (IVN) and unsupervised online adaptation (OLA). In structured modeling of basic speech units, speech information relevant to phonetic classification is modeled by traditional hidden Markov models (HMMs), while factors irrelevant to phonetic classification are taken care of by an auxiliary module. An IVN-based training procedure can then be designed to estimate parameters of the generic HMMs and the auxiliary module from a large amount of diversified training data. In recognition stage, the parameters of the auxiliary module can be updated via unsupervised OLA by using the unknown utterance itself, which is recognized again to achieve a better performance by using the compensated models composed from the generic HMMs and the adapted auxiliary module. In this talk, the speaker will explain the above key concepts and methodology, and elaborate on several robust ASR techniques thereof, which have achieved the state-of-the-art performance over the years on both Aurora2 and Aurora3 tasks.
Cite as: Huo, Q. (2008) Towards Robust Speech Recognition: Structured Modeling, Irrelevant Variability Normalization and Unsupervised Online Adaptation. Proc. International Symposium on Chinese Spoken Language Processing
@inproceedings{huo08_iscslp, author={Qiang Huo}, title={{Towards Robust Speech Recognition: Structured Modeling, Irrelevant Variability Normalization and Unsupervised Online Adaptation}}, year=2008, booktitle={Proc. International Symposium on Chinese Spoken Language Processing} }