NITK Kids’ Speech Corpus

Pravin Bhaskar Ramteke, Sujata Supanekar, Pradyoth Hegde, Hanna Nelson, Venkataraja Aithal, Shashidhar G. Koolagudi


This paper introduces speech database for analyzing children’s speech. The proposed database of children is recorded in Kannada language (one of the South Indian languages) from children between age 2.5 to 6.5 years. The database is named as National Institute of Technology Karnataka Kids’ Speech Corpus (NITK Kids’ Speech Corpus). The relevant design considerations for the database collection are discussed in detail. It is divided into four age groups with an interval of 1 year between each age group. The speech corpus includes nearly 10 hours of speech recordings from 160 children. For each age range, the data is recorded from 40 children (20 male and 20 female). Further, the effect of developmental changes on the speech from 2.5 to 6.5 years are analyzed using pitch and formant analysis. Some of the potential applications, of the NITK Kids’ Speech Corpus, such as, systematic study on the language learning ability of children, phonological process analysis and children speech recognition are discussed.


 DOI: 10.21437/Interspeech.2019-2061

Cite as: Ramteke, P.B., Supanekar, S., Hegde, P., Nelson, H., Aithal, V., Koolagudi, S.G. (2019) NITK Kids’ Speech Corpus. Proc. Interspeech 2019, 331-335, DOI: 10.21437/Interspeech.2019-2061.


@inproceedings{Ramteke2019,
  author={Pravin Bhaskar Ramteke and Sujata Supanekar and Pradyoth Hegde and Hanna Nelson and Venkataraja Aithal and Shashidhar G. Koolagudi},
  title={{NITK Kids’ Speech Corpus}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={331--335},
  doi={10.21437/Interspeech.2019-2061},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2061}
}