9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Speech Recognition Using Non-Linear Trajectories in a Formant-Based Articulatory Layer of a Multiple-Level Segmental HMM

Hongwei Hu, Martin J. Russell

University of Birmingham, UK

This paper describes how non-linear formant trajectories, based on 'trajectory HMM' proposed by Tokuda et al., can be exploited under the framework of multiple-level segmental HMMs. In the resultant model, named a non-linear/linear multiple-level segmental HMM, speech dynamics are modeled as non-linear smooth trajectories in the formant-based intermediate layer. These formant trajectories are mapped into the acoustic layer using a set of one or more linear mappings. The N-best rescoring paradigm is employed to evaluate the performance of the non-linear formant trajectories. The rescoring results on TIMIT corpus show that the introduction of non-linear formant trajectories results in improvement on recognition phone accuracy compared with linear trajectories.

