14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Automatic Estimation of Dialect Mixing Ratio for Dialect Speech Recognition

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G. Okuno

Kyoto University, Japan

This paper proposes methods for determining an appropriate mixing ratio of dialects in automatic speech recognition (ASR) for dialects. To handle ASR for various dialects, it has been reported to be effective to train a language model using a dialect-mixed corpus. One reason behind this is geographical continuity of spoken dialect; we regard spoken dialect as a mixture of various dialects. This mixing ratio changes at every moment as well as depends on a speaker. We can improve recognition accuracy by giving an appropriate dialect mixing ratio for a speaker's dialect. The mixing ratio is generally unknown and requires to be estimated and updated referring to input utterances. We handle two methods for updating it based on recognition results; one is to compute contribution of dialects for each recognized word, and the other is to predict mixture information referring to a whole recognized sentence based on topic modeling. The experimental result shows that the mixing ratio estimated by these methods realized higher recognition accuracy than a fixed mixing ratio.

Full Paper

Bibliographic reference.  Hirayama, Naoki / Yoshino, Koichiro / Itoyama, Katsutoshi / Mori, Shinsuke / Okuno, Hiroshi G. (2013): "Automatic estimation of dialect mixing ratio for dialect speech recognition", In INTERSPEECH-2013, 1492-1496.