Third International Conference on Spoken Language Processing (ICSLP 94)
We present a speech recognition model using continuous internal degrees of freedom (IDF) between acoustic observation and phonetic units. Trajectories in the internal space are assumed to play an important role in the synthetic transformations from phonetic transcriptions to observations. The phonetic transcriptions define a series of target points for the trajectory. Moreover, the trajectories are constrained to be as continuous as possible. As a result, it is shown that the model represents the long term correlation in the observation, that is not found in HMMs. We also examine a discrete model approximating the internal space as a set of finite number of representative points. The resultant model is based on two HMMs mutually coupled. One is a discrete HMM (HMM-I) for a phonetic unit that outputs a trajectory in the internal space. The other is a shared ergodic HMM (HMM-II) with states labeled by the output symbols of HMM-I. The HMM-II implies the acoustic constraints in the observation and accounts for the interactions between neighboring phones through its state transitions.
Bibliographic reference. Iso, Ken-ichi (1994): "A speech recognition model using internal degrees of freedom", In ICSLP-1994, 1563-1566.