In China, there are hundred kinds of dialects. By traditional dialectology, they are classified into seven big dialect regions and most of them also have many sub-dialects and sub-sub-dialects. As they are different in various linguistic aspects, people from different dialect regions often cannot communicate orally. But for the sub-dialects of one dialect region, although they are sometimes still mutually unintelligible, more common features are shared. In this paper, a dialect pronunciation structure, which has been used successfully in dialect-based speaker classification in our previous work [1], is examined for the task of speaker classification and distance measurement among cities based on sub-dialects of Mandarin. Using the finals of the dialectal utterances of a specific list of written characters, a dialect pronunciation structure is built for every speaker in a data set and these speakers are classified based on the distances among their structures. Then, the results of classifying 16 Mandarin speakers based on their sub-dialects show that they are linguistically classified with little influence of their age and gender. Finally, distances among sub-sub-dialects are similarly calculated and evaluated. All the results show high validity and accordance to linguistic studies.
Cite as: Ma, X., Nemoto, A., Minematsu, N., Qiao, Y., Hirose, K. (2009) Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese. Proc. Interspeech 2009, 2219-2222, doi: 10.21437/Interspeech.2009-631
@inproceedings{ma09c_interspeech, author={Xuebin Ma and Akira Nemoto and Nobuaki Minematsu and Yu Qiao and Keikichi Hirose}, title={{Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2219--2222}, doi={10.21437/Interspeech.2009-631} }