ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Advances in word based dialect/accent classification

Rongqing Huang, John H. L. Hansen

In an earlier study, we proposed a very effective dialect/accent classification algorithm, which is named Word based Dialect Classification (WDC). The WDC works well for large size corpora and significantly outperforms traditional Large Vocabulary Continuous Speech Recognition (LVCSR) based systems, which is claimed to be the best performing system for language identification. For a small training corpus, however, it is difficult to obtain a robust statistical model for each word and each dialect. Therefore, a Context Adapted Training (CAT) algorithm is formulated here, which adapts the universal phoneme GMMs to dialect-dependent word HMMs via linear regression. Employing on a 8-dialect British English corpus-IViE, the CAT algorithm trained WDC system obtains a 35.5% relative classification error reduction from the baseline LVCSR system, and a 20.2% relative classification error reduction from the basic WDC system.

doi: 10.21437/Interspeech.2005-709

Cite as: Huang, R., Hansen, J.H.L. (2005) Advances in word based dialect/accent classification. Proc. Interspeech 2005, 2241-2244, doi: 10.21437/Interspeech.2005-709

  author={Rongqing Huang and John H. L. Hansen},
  title={{Advances in word based dialect/accent classification}},
  booktitle={Proc. Interspeech 2005},