Speech acoustic patterns vary significantly as a result of coarticulation and lenition processes that are shaped by segmental context or by performance factors such as production rate and degree of casualness. The resultant acoustic variability continues to offer serious challenges for the development of automatic speech recognition (ASR) systems. Articulatory phonology provides a formalism to understand coarticulation through spatiotemporal changes in the patterns of underlying gestures. This paper studies the coarticulation occurring in certain fast spoken utterances using articulatory constriction tract-variables (TVs) estimated from acoustic features. The TV estimators are trained on the University of Wisconsin X-ray Microbeam (XRMB) database. The utterances analyzed are from a different corpus containing simultaneous acoustic and Electromagnetic Articulograph (EMA) data. Plots of the estimated TVs show that the estimation procedure successfully detected the articulatory constrictions even in the case of highly coarticulated utterances that a state-of-the-art phone recognition system failed to detect. These results highlight the potential of TV trajectory estimation methods for improving the performance of phone recognition systems, particularly when sounds are reduced or deleted.
Bibliographic reference. Sivaraman, Ganesh / Mitra, Vikramjit / Tiede, Mark K. / Saltzman, Elliot / Goldstein, Louis / Espy-Wilson, Carol (2015): "Analysis of coarticulated speech using estimated articulatory trajectories", In INTERSPEECH-2015, 369-373.