SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese

Nancy F. Chen, Rong Tong, Darren Wee, Peixuan Lee, Bin Ma, Haizhou Li


We present SingaKids-Mandarin, a speech corpus of 255 Singaporean children aged 7 to 12 reading Mandarin Chinese, for a total of 125 hours of data (75 hours of speech) and 79,843 utterances. This corpus is phonetically balanced and detailed in human annotations, including phonetic transcriptions, lexical tone markings, and proficiency scoring at the utterance level. The reading scripts span a diverse set of utterance styles, covering syllable-level minimal pairs, words, phrases, sentences, and short stories. We analyze the acoustic properties of Singaporean children. We also observe that while the lack of the neutral tone is the same for Singaporean adults and children, the phonetic pronunciation patterns in these two age groups differ: although Singaporean adults tend to front their retroflex, nasal, and palatal consonants, Singaporean children show both fronting and backing in these consonants. For future work, we plan to develop computer-assisted pronunciation training (CAPT) systems with SingaKids-Mandarin.


DOI: 10.21437/Interspeech.2016-139

Cite as

Chen, N.F., Tong, R., Wee, D., Lee, P., Ma, B., Li, H. (2016) SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese. Proc. Interspeech 2016, 1545-1549.

Bibtex
@inproceedings{Chen+2016,
author={Nancy F. Chen and Rong Tong and Darren Wee and Peixuan Lee and Bin Ma and Haizhou Li},
title={SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-139},
url={http://dx.doi.org/10.21437/Interspeech.2016-139},
pages={1545--1549}
}