Consonant Classification in Mandarin Based on the Depth Image Feature: A Pilot Study

Han-Chi Hsieh, Wei-Zhong Zheng, Ko-Chiang Chen, Ying-Hui Lai


The consonant is an important element in Mandarin, and various categories of consonant generation effectuate various facial expressions. Specifically, there are changes in facial muscles when speaking, and these changes are closely related to pronunciation; the facial muscles are associated with these hidden articulators, and the effects on the facial changes can be seen as 3D changes. However, in most studies, 2D images are used to analyze facial features when people talk. The 2D images serve to provide information in two dimensions (x- and y-axis); however, subtle deep motions (z-axis changes) of facial muscles when speaking can be difficult to detect accurately. Hence, the depth feature of the face (the point cloud feature in this study) was used to investigate the potential for consonant recognition, recorded by a time-of-flight 3D camera. In this study, we propose an algorithm to recognize the seven categories of Mandarin consonants using the depth features of the speaker’s face. The proposed system yielded suitable classification accuracy for the recognition of seven categories of Mandarin consonants. This result implies that depth features can be used for speech-processing applications.


 DOI: 10.21437/Interspeech.2019-1893

Cite as: Hsieh, H., Zheng, W., Chen, K., Lai, Y. (2019) Consonant Classification in Mandarin Based on the Depth Image Feature: A Pilot Study. Proc. Interspeech 2019, 2300-2304, DOI: 10.21437/Interspeech.2019-1893.


@inproceedings{Hsieh2019,
  author={Han-Chi Hsieh and Wei-Zhong Zheng and Ko-Chiang Chen and Ying-Hui Lai},
  title={{Consonant Classification in Mandarin Based on the Depth Image Feature: A Pilot Study}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2300--2304},
  doi={10.21437/Interspeech.2019-1893},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1893}
}