The trajectory HMM has been shown to be useful for model-based speech synthesis where a smoothed trajectory is generated using temporal constraints imposed by dynamic features. To evaluate the performance of such model on an ASR task, we present a trajectory decoder based on tree search with delayed path merging. Experiment on a speaker-dependent phone recognition task using the MOCHA-TIMIT database shows that the MLE-trained trajectory model, while retaining attractive properties of being a proper generative model, tends to favour over-smoothed trajectory among competing hypothesises, and does not perform better than a conventional HMM. We use this to build an argument that models giving better fit on training data may suffer a reduction of discrimination by being too faithful to training data. This partially explains why alternative acoustic models that try to explicitly model temporal constraints do not achieve significant improvements in ASR.
Cite as: Zhang, L., Renals, S. (2006) Phone recognition analysis for trajectory HMM. Proc. Interspeech 2006, paper 1203-Mon3BuP.1, doi: 10.21437/Interspeech.2006-216
@inproceedings{zhang06c_interspeech, author={Le Zhang and Steve Renals}, title={{Phone recognition analysis for trajectory HMM}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1203-Mon3BuP.1}, doi={10.21437/Interspeech.2006-216} }