5^{th} International Conference on Spoken Language ProcessingSydney, Australia |
Hidden Markov modeling of speech waveforms is studied and applied to speech recognition of clean and noisy signals. Signal vectors in each state are assumed Gaussian with zero mean and a Toeplitz covariance matrix. This model allows short signal vectors and thus is useful for speech signals with rapidly changing second order statistics. It can also be straightforwardly adapted to noisy signals especially when the noise is additive and independent of the signal. Since no closed form solution exists for the maximum likelihood estimate of the Toeplitz covariance matrices, an expectation-maximization procedure was used and efficiently implemented. HMM's with Toeplitz as well as asymptotically Toeplitz (e.g., circulant, autoregressive) covariance matrices are theoretically and experimentally studied. While asymptotically all of these matrices provide similar performance, they differ significantly when the frame length is finite. Recognition results are provided for clean and noisy signals at 0-30dB SNR.
Bibliographic reference. Roberts, William J.J. / Ephraim, Yariv (1998): "Robust speech recognition using HMM's with toeplitz state covariance matrices", In ICSLP-1998, paper 0141.