The distribution of feature vectors derived from short speech segments is considered a mixture of Gaussian densities. Each density corresponds to a phonetic state in the speech production process and is trained by the observed feature vectors derived from waveform segments corresponding to a specific phoneme. We propose in this paper that a short utterance of speech like a syllable, can be statically represented by a matrix of state transition probabilities considering the utterance as a chain of discrete acoustic events. With these static models, we experienced very competitive performance in terms of recognition rates and speeds when compared with that using Dynamic Programming and Hidden Markov Models for recognition. We are convinced that the proposed static model is more resilient against the lack of huge amount of training data and characterizes the dynamics of short utterances sufficiently for recognition purposes.
Bibliographic reference. Chan, Chorkin / Bao, Jun / Wu, Jian-xiong (1989): "A preliminary study on the static representation of short-timed speech dynamics", In EUROSPEECH-1989, 1462-1465.