EUROSPEECH '91

This paper describes a matrix representation from which we can derive a new formulation of HMMbased speech recognition algorithms. This idea provides not only an alternative mathematical formulation equivalent to conventional trellis and Viterbi algorithms but also better understanding of HMM algorithms under grammatical constraints as well as more efficient computational possibilities. In this formulation, a likelihood matrix is defined by an (N + 1) x (N + 1) dimensional upper triangular matrix whose (t,s) component is the observation likelihood of the given signal in a time span between t + 1 and s. First, it is shown that the likelihood matrix for a pair of serially connected signal sources is the product of matrices (P = P1P2) and the parallel connection is represented by the sum (P = Pi + P2) From these basic properties, matrixbased HMM computation al gorithms are derived. Explicit duration control at all levels, such as state, phoneme, syllable, and word, can be easily done. Grammatical rewriting rules are directly interpreted as matrix operations. A matrix parser is suggested for generalization of a CYK parser. This algorithm is particularly effective in large vocabulary systems where same phone units (phonemes) appear in many syntactic paths.
Bibliographic reference. Sagayama, Shigeki (1991): "A matrix representation of HMMbased speech recognition algorithms", In EUROSPEECH1991, 12251228.