Some models of speech perception/production and language acquisition make use of a quasi-continuous representation of the acoustic speech signal. We investigate whether such models could potentially profit from incorporating articulatory information in an analogous fashion. In particular, we investigate how articulatory information represented by EMA measurements can influence unsupervised phonetic speech categorization. By incorporation of the acoustic signal and non-synthetic, raw articulatory data, we present first results of a clustering procedure, which is similarly applied in numerous language acquisition and speech perception models. It is observed that non-labeled articulatory data, i.e. without previously assumed landmarks, perform fine clustering results. A more effective clustering outcome for plosives than for vowels seems to support the motor view of speech perception.
Bibliographic reference. Duran, Daniel / Bruni, Jagoda / Dogil, Grzegorz / Schütze, Hinrich (2011): "Speech events are recoverable from unlabeled articulatory data: using an unsupervised clustering approach on data obtained from electromagnetic midsaggital articulography (EMA)", In INTERSPEECH-2011, 2201-2204.