Many current speech recognition systems use very large statistical models using many thousands, perhaps millions, of parameters to account for variability in speech signals observed in large training corpora, and represent speech as sequences of discrete, independent events. The mechanisms of speech production are, however, conceptually very simple and involve continuous smooth movement of a small number of speech articulators. We report progress towards a practical implementation of a parsimonious continuous state hidden Markov model for recovery of voiced phoneme sequences from trajectories of such continuous, dynamic speech production features, using of the order of several hundred parameters. We describe automated training of the parameters using a forced alignment procedure, and results for training and testing on an individual speaker.
Bibliographic reference. Houghton, S. M. / Champion, Colin J. / Weber, Philip (2015): "Recognition of voiced sounds with a continuous state HMM", In INTERSPEECH-2015, 523-527.