A long standing view in speech production research posits that articulatory representations are low dimensional. Conceptual and computational models have been built based on this view. In this work we explore the nature of low dimensional representations derived directly from articulatory signals based on sparsity constraints. Specifically, we present a method to examine how well derived representations of “primitive movements” of speech articulation can be used to classify broad phone categories. We first extract these spatiotemporal primitives from a data matrix of human speech articulation data using a weakly-supervised learning method that attempts to find a part-based representation of the data in terms of basis units (or primitives) and their corresponding activations over time. For each phone interval, we then derive a feature representation that captures the co-occurrences between the activations of the various bases over different timelags. We show that this feature, derived entirely from activations of these primitive movements, is able to achieve an accuracy of about 80% on an interval-based phone classification task. We discuss the implications of these findings in furthering our understanding of speech signal representations.
Index Terms— speech communication, movement primitives, phone classification, motor theory, information transfer.
Cite as: Ramanarayanan, V., Segbroeck, M.V., Narayanan, S.S. (2013) On the nature of data-driven primitive representations of speech articulation. Proc. Speech Production in Automatic Speech Recognition (SPASR-2013), 16-21
@inproceedings{ramanarayanan13_spasr, author={Vikram Ramanarayanan and Maarten Van Segbroeck and Shrikanth S. Narayanan}, title={{On the nature of data-driven primitive representations of speech articulation}}, year=2013, booktitle={Proc. Speech Production in Automatic Speech Recognition (SPASR-2013)}, pages={16--21} }