7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Modeling Frequent Allophones in Japanese Speech Recognition

Long Nguyen, Xuefeng Guo, John Makhoul

BBN Technologies, USA

In this paper, we describe a technique to model frequent allophones in Japanese speech recognition. The Consonant-Vowel syllabic structure (CV) is dominant in Japanese. Based on frequency, the distribution of CV pairs is rather skewed. Isolating out the most frequent allophones through the use of additional phonemes in acoustic modeling can achieve better recognition accuracy. By introducing ten new phonemes for the five most common CV pairs, we achieved a 30% relative reduction in word error rate for spontaneous speech and 6% relative reduction overall for all speech categories in a Japanese broadcast news transcription task.


Full Paper

Bibliographic reference.  Nguyen, Long / Guo, Xuefeng / Makhoul, John (2002): "Modeling frequent allophones in Japanese speech recognition", In ICSLP-2002, 709-712.