Both perceptual and acoustic studies of children’s speech independently
suggest that phonological contrasts are continuously refined during
acquisition. This paper considers two traditional acoustic features
for the ‘s’-vs.-‘sh’ contrast (centroid and
peak frequencies) and a novel feature learned from data, evaluating
these features relative to perceptual ratings of children’s productions.
Productions of sibilant fricatives were elicited from 16 adults
and 69 preschool children. A second group of adults rated the children’s
productions on a visual analog scale (VAS). Each production was rated
by multiple listeners; mean VAS score for each production was used
as its perceptual goodness rating. For each production from the repetition
task, a psychoacoustic spectrum was estimated by passing it through
a filter bank that modeled the auditory periphery. From these spectra
centroid and peak frequencies were computed, two traditional features
for a sibilant fricative’s place of articulation. A novel acoustic
measure was derived by inputting the spectra to a graph-based dimensionality-reduction
algorithm.
Simple regression analyses indicated that a greater amount of
variance in the VAS scores was explained by the novel feature (adjusted
R2 = 0.569) than by either centroid (adjusted R2
= 0.468) or peak frequency (adjusted R2 = 0.254).
Cite as: Reidy, P.F., Beckman, M.E., Edwards, J., Munson, B. (2017) A Data-Driven Approach for Perceptually Validated Acoustic Features for Children’s Sibilant Fricative Productions. Proc. Interspeech 2017, 1750-1754, doi: 10.21437/Interspeech.2017-1607
@inproceedings{reidy17_interspeech, author={Patrick F. Reidy and Mary E. Beckman and Jan Edwards and Benjamin Munson}, title={{A Data-Driven Approach for Perceptually Validated Acoustic Features for Children’s Sibilant Fricative Productions}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1750--1754}, doi={10.21437/Interspeech.2017-1607} }