INTERSPEECH 2004 - ICSLP
In this paper, we present our recent work in the analysis and modeling of speech under dialect. Dialect and accent significantly influence automatic speech recognition performance, and therefore it is critical to detect and classify non-native speech. In this study, we consider three areas that include: (i) prosodic structure (normalized f0, syllable rate, and sentence duration), (ii) phoneme acoustic space modeling and sub-word classification, and (iii) word-level based modeling using large vocabulary data. The corpora used in this study include: the NATO N-4 corpus (2 accents, 2 dialects of English), TIMIT (7 dialect regions), and American and British English versions of the WSJ corpus. These corpora were selected because the contained audio material from specific dialects/accents of English (N- 4), were phonetically balanced and organized across U.S. (TIMIT), or contained significant amounts of read audio material from distinct dialects (WSJ). The results show that significant changes occur at the prosodic, phoneme space, and word levels for dialect analysis, and that effective dialect classification can be achieved using processing strategies from each domain.
Bibliographic reference. Hansen, John H. L. / Yapanel, Umit / Huang, Rongqing / Ikeno, Ayako (2004): "Dialect analysis and modeling for automatic classification", In INTERSPEECH-2004, 1569-1572.