Third International Conference on Spoken Language Processing (ICSLP 94)
We are experimenting with an approach to connectionist speech recognition that models the dynamics within a speech segment using temporal position as an explicit variable. Currently, the most common model for human speech production that is used in speech recognition is the Hidden Markov Model (HMM). However, HMMs suffer from well known limitations; most notably, the assumption that the observations generated in a given state are independent and identically distributed (i.i.d.). As an alternative, we are developing a time index model that explicitly conditions the emission probability of a state on the time index, where time index is defined as the number of frames since entering a state till the current frame. Thus, the proposed model does not require the i.i.d. assumption. Our pilot results suggest that the time-index approach can greatly reduce error if we have good information about the phoneme boundary location.
Bibliographic reference. Konig, Yochai / Morgan, Nelson (1994): "Modeling dynamics in connectionist speech recognition - the time index model", In ICSLP-1994, 1523-1526.