For over a decade, the Hidden Markov Model (HMM) has been the primary tool used for acoustic modeling in the field of speech recognition. In this paper we examine a more general approach using a Partially Observable Markov Decision Process (POMDP) to model the base phonetic unit. We introduce the concept of multiple phonetic context classes, one for each of the infinite possible contexts a phoneme can be in, and show how a POMDP can be used to represent such a model. Much the same way that tying mixtures at the state level across phonemes sharing linguistic properties is used to fill in gaps in the model space due to lack of data, the POMDP model can fill in additional gaps, in effect adding a second level of clustering driven by the data itself.
Cite as: Jonas, M., Schmolze, J.G. (2005) Hierarchical clustering of mixture tying using a partially observable Markov decision process. Proc. Interspeech 2005, 2953-2956, doi: 10.21437/Interspeech.2005-127
@inproceedings{jonas05_interspeech, author={Michael Jonas and James G. Schmolze}, title={{Hierarchical clustering of mixture tying using a partially observable Markov decision process}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2953--2956}, doi={10.21437/Interspeech.2005-127} }