7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper, we describe a technique to model frequent allophones in Japanese speech recognition. The Consonant-Vowel syllabic structure (CV) is dominant in Japanese. Based on frequency, the distribution of CV pairs is rather skewed. Isolating out the most frequent allophones through the use of additional phonemes in acoustic modeling can achieve better recognition accuracy. By introducing ten new phonemes for the five most common CV pairs, we achieved a 30% relative reduction in word error rate for spontaneous speech and 6% relative reduction overall for all speech categories in a Japanese broadcast news transcription task.
Bibliographic reference. Nguyen, Long / Guo, Xuefeng / Makhoul, John (2002): "Modeling frequent allophones in Japanese speech recognition", In ICSLP-2002, 709-712.