INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Phonetic Confusion Analysis and Robust Phone Set Generation for Shanghai-Accented Mandarin Speech Recognition

Guo-Hong Ding

Nokia Research Center, China

In this paper, accent issues are discussed for Shanghai-accented Mandarin speech recognition. The phonetic confusion is analyzed in detail based on the alignment between the surface form and the baseform transcriptions. Mutual information is used as the measure to extract the most confusing phoneme pairs. It was found that each phoneme in one pair can be easily misrecognized with the other. To remove the phonetic confusion, it is better to replace the two phonemes in one pair with a newly generated one. Consequentially new phone sets are derived. The phonetic confusion analysis and the experimental evaluation are performed on a Shanghai-accented Mandarin speech corpus. Experimental results show that compared to the canonical phone set, the generated one can reduce the substitution error greatly and achieve a 0.72% absolute Chinese character error rate (CER) reduction. When it is combined with pronunciation modeling, the absolute CER reduction is 1.58%.

Full Paper

Bibliographic reference.  Ding, Guo-Hong (2008): "Phonetic confusion analysis and robust phone set generation for Shanghai-accented Mandarin speech recognition", In INTERSPEECH-2008, 1129-1132.