This paper describes how non-linear formant trajectories, based on 'trajectory HMM' proposed by Tokuda et al., can be exploited under the framework of multiple-level segmental HMMs. In the resultant model, named a non-linear/linear multiple-level segmental HMM, speech dynamics are modeled as non-linear smooth trajectories in the formant-based intermediate layer. These formant trajectories are mapped into the acoustic layer using a set of one or more linear mappings. The N-best rescoring paradigm is employed to evaluate the performance of the non-linear formant trajectories. The rescoring results on TIMIT corpus show that the introduction of non-linear formant trajectories results in improvement on recognition phone accuracy compared with linear trajectories.
Bibliographic reference. Hu, Hongwei / Russell, Martin J. (2008): "Speech recognition using non-linear trajectories in a formant-based articulatory layer of a multiple-level segmental HMM", In INTERSPEECH-2008, 2422-2425.