Child-directed spoken data is the ideal source of support for claims
about children’s linguistic environments. However, phonological
transcriptions of child-directed speech are scarce, compared to sources
like adult-directed speech or text data. Acquiring reliable descriptions
of children’s phonological environments from more readily accessible
sources would mean considerable savings of time and money. The first
step towards this goal is to quantify the reliability of descriptions
derived from such secondary sources.
We investigate how
phonological distributions vary across different modalities (spoken
vs. written), and across the age of the intended audience (children
vs. adults). Using a previously unseen collection of Swedish adult-
and child-directed spoken and written data, we combine lexicon look-up
and grapheme-to-phoneme conversion to approximate phonological characteristics.
The analysis shows distributional differences across datasets both
for single phonemes and for longer phoneme sequences. Some of these
are predictably attributed to lexical and contextual characteristics
of text vs. speech.
The generated phonological
transcriptions are remarkably reliable. The differences in phonological
distributions between child-directed speech and secondary sources highlight
a need for compensatory measures when relying on written data or on
adult-directed spoken data, and/or for continued collection of actual
child-directed speech in research on children’s language environments.
Cite as: Strömbergsson, S., Edlund, J., Götze, J., Björkenstam, K.N. (2017) Approximating Phonotactic Input in Children’s Linguistic Environments from Orthographic Transcripts. Proc. Interspeech 2017, 2213-2217, doi: 10.21437/Interspeech.2017-1634
@inproceedings{strombergsson17_interspeech, author={Sofia Strömbergsson and Jens Edlund and Jana Götze and Kristina Nilsson Björkenstam}, title={{Approximating Phonotactic Input in Children’s Linguistic Environments from Orthographic Transcripts}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2213--2217}, doi={10.21437/Interspeech.2017-1634} }