To make full use of a small development data set to build a robust dialectal Chinese speech recognizer from a standard Chinese speech recognizer (based on Chinese Initial/Final, IF), a novel, simple but effective acoustic modeling method, named state-dependent phoneme-based model merging (SDPBMM), is proposed and evaluated, where a shared-state of standard tri-IF is merged with a state of dialectal mono-IF in terms of pronunciation variation modeling. Specifically, in order to deal with phonetic-level pronunciation variations in SDPBMM, distance-based pronunciation modeling is proposed based on a small dialectal Chinese data set. With a 40-minute Shanghai-dialectal Chinese data set, SDPBMM can achieve a significant syllable error rate (SER) reduction of 14.3% for dialectal Chinese with almost no performance degradation for standard Chinese. Experimentally, SDPBMM can also outperform the maximum likelihood linear regression (MLLR) adaptation and the pooled retraining methods with relative SER reductions by 2.8% and 10.6%, respectively. If SDPBMM is combined with the MLLR adaptation, another relative SER reduction of 3.3% can be further achieved.
Bibliographic reference. Liu, Linquan / Zheng, Thomas Fang / Akabane, Makoto / Chen, Ruxin / Wu, Wenhu (2007): "Using a small development set to build a robust dialectal Chinese speech recognizer", In INTERSPEECH-2007, 1729-1732.