Interspeech'2005 - Eurospeech
In an earlier study, we proposed a very effective dialect/accent classification algorithm, which is named Word based Dialect Classification (WDC). The WDC works well for large size corpora and significantly outperforms traditional Large Vocabulary Continuous Speech Recognition (LVCSR) based systems, which is claimed to be the best performing system for language identification. For a small training corpus, however, it is difficult to obtain a robust statistical model for each word and each dialect. Therefore, a Context Adapted Training (CAT) algorithm is formulated here, which adapts the universal phoneme GMMs to dialect-dependent word HMMs via linear regression. Employing on a 8-dialect British English corpus-IViE, the CAT algorithm trained WDC system obtains a 35.5% relative classification error reduction from the baseline LVCSR system, and a 20.2% relative classification error reduction from the basic WDC system.
Bibliographic reference. Huang, Rongqing / Hansen, John H. L. (2005): "Advances in word based dialect/accent classification", In INTERSPEECH-2005, 2241-2244.