Approximating Phonotactic Input in Children’s Linguistic Environments from Orthographic Transcripts

Sofia Strömbergsson, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam


Child-directed spoken data is the ideal source of support for claims about children’s linguistic environments. However, phonological transcriptions of child-directed speech are scarce, compared to sources like adult-directed speech or text data. Acquiring reliable descriptions of children’s phonological environments from more readily accessible sources would mean considerable savings of time and money. The first step towards this goal is to quantify the reliability of descriptions derived from such secondary sources.

We investigate how phonological distributions vary across different modalities (spoken vs. written), and across the age of the intended audience (children vs. adults). Using a previously unseen collection of Swedish adult- and child-directed spoken and written data, we combine lexicon look-up and grapheme-to-phoneme conversion to approximate phonological characteristics. The analysis shows distributional differences across datasets both for single phonemes and for longer phoneme sequences. Some of these are predictably attributed to lexical and contextual characteristics of text vs. speech.

The generated phonological transcriptions are remarkably reliable. The differences in phonological distributions between child-directed speech and secondary sources highlight a need for compensatory measures when relying on written data or on adult-directed spoken data, and/or for continued collection of actual child-directed speech in research on children’s language environments.


 DOI: 10.21437/Interspeech.2017-1634

Cite as: Strömbergsson, S., Edlund, J., Götze, J., Björkenstam, K.N. (2017) Approximating Phonotactic Input in Children’s Linguistic Environments from Orthographic Transcripts. Proc. Interspeech 2017, 2213-2217, DOI: 10.21437/Interspeech.2017-1634.


@inproceedings{Strömbergsson2017,
  author={Sofia Strömbergsson and Jens Edlund and Jana Götze and Kristina Nilsson Björkenstam},
  title={Approximating Phonotactic Input in Children’s Linguistic Environments from Orthographic Transcripts},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2213--2217},
  doi={10.21437/Interspeech.2017-1634},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1634}
}