Phonological feature space has been proposed to represent acoustic models for automatic speech recognition (ASR) tasks. The most successful methods to detect articulatory gestures from the speech signal are based on Time Delay Neural Networks (TDNN). Stochastic Finite-State Automata have been effectively used in many speech-input natural language tasks. They are versatile models with well established learning algorithms that can easily be combined with other models. A two-level finite state model has also been proposed to classify articulatory features. However in this case a strong discretization procedure was required. In this work we propose a hierarchical finite-state model that considers two space of representations based on phonological features and on acoustic parameters, respectively. This model was evaluated in a phonological features identification task over a Spanish corpus. Experimental results show better frame classification accuracy than discrete models. Moreover, some specific articulations are better identified by the proposed models than by TDNN, leading to higher phone identification rates at frame level.
Bibliographic reference. Olaso, Javier Mikel / Torres, María Inés (2013): "Hierarchical models based on a continuous acoustic space to identify phonological features", In INTERSPEECH-2013, 1771-1775.