Konstanz prosodically annotated infant-directed speech corpus (KIDS corpus)

Katharina Zahner, Muna Schönhuber, Janet Grijzenhout, Bettina Braun

Knowing the infants’ input is a prerequisite for modern theories of first language acquisition. Here, we present the first prosodically annotated infant-directed speech corpus in German (KIDS corpus) – a tool for formulating hypotheses and modeling acquisition processes in the prosodic domain and the prosody-syntax interface. The multi-layered corpus consists of 524 intonation phrases (IPs) directed to infants younger than one year (196 IPs extracted from the CHILDES database; 328 IPs from own recordings). Pitch accents (n=832) and boundary tones (n=1048) were labeled according to GToBI. Furthermore, we annotated the presence of unstressed syllables and pitch targets before and after the accentual syllable. We also tagged the word-prosodic structure of all accented words and the syntactic category of both accented and unaccented words. Results showed that 41% of the lexical and function words carried a pitch accent. Within the corpus, most words were verbs, but the words that bear a pitch accent were most often nouns. The majority of phrases started and ended in low boundary tones. The most frequent pitch accent types were H* and L+H*. The data are discussed in terms of elicitation setting and potential implications for first language acquisition mechanisms.

DOI: 10.21437/SpeechProsody.2016-115

Zahner, K., Schönhuber, M., Grijzenhout, J., Braun, B. (2016) Konstanz prosodically annotated infant-directed speech corpus (KIDS corpus). Proc. Speech Prosody 2016, 562-566.

