Sixth International Conference on Spoken Language Processing
Pronunciation in spontaneous Mandarin speech tends to be much more variable than in read speech. In current recognition systems, pronunciation dictionaries usually only contain one standard pronunciation for each word, so that the amount of variability that can be modelled is very limited. Most recent research work for modelling variations in spontaneous speech focuses on the lexicon level, which can only solve intra-word variations. Inter-word variations cannot be modelled effectively. Chinese is monosyllabic and has simple syllable structure, giving rise to a high amount of pronunciation variations. In this paper, we propose two methods to model pronunciation variations in spontaneous Mandarin speech. First, we generate probability lexicon to model intra-syllable variations by using DP alignment algorithm between base form and surface strings. Second, we integrate variation probability into the decoder to model intra as well as inter-syllable variations. Experimental results show that modelling intra-syllable variation with a probability lexicon reduces syllable error rate by 0.85% (phone error rate reduction of 1.4%) while adding inter-syllable variation in addition reduces syllable error rate significantly by 4.76% (phone error rate reduction of 7.6%) compared to the baseline system.
Bibliographic reference. Liu, Yi / Fung, Pascale (2000): "Modelling pronunciation variations in spontaneous Mandarin speech", In ICSLP-2000, vol.3, 630-633.