ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Robust speech recognition via modeling spectral coefficients with HMM's with complex Gaussian components

William J.J. Roberts, Sadaoki Furui

Robust speech recognition via hidden Markov model- ing of spectral vectors is studied in this paper. The hid- den Markov model (HMM) mixture components are as- sumed complex Gaussian with zero mean, diagonal co- variance, and with incorporating an unknown scalar gain term. The gain term is associated with each spectral vec- tor and it models the varying energy of speech signals. It is estimated by applying the maximum likelihood (ML) criterion. On an isolated digit database, in clean condi- tions, the spectral modeling with ML gain estimation ap- proach achieved similar performance to cepstral modeling of speech.

Two additive noise compensation approaches for the spectral modeling scheme are also considered. The first approach requires a full noise HMM. This HMM is com- bined with the clean speech HMM to yield a noisy speech HMM. The second approach requires only the spectral shape of the noise. A term dependent on the spectral shape, together with an unknown magnitude term, is in- corporated into the clean speech HMM to yield a noisy speech HMM. The unknown magnitude of the noise is es- timated via the ML criterion. The performance of these two approaches for isolated digit recognition in noise is demonstrated and compared to a robust cepstral model- ing approach from the literature.


doi: 10.21437/ICSLP.2000-854

Cite as: Roberts, W.J.J., Furui, S. (2000) Robust speech recognition via modeling spectral coefficients with HMM's with complex Gaussian components. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 484-487, doi: 10.21437/ICSLP.2000-854

@inproceedings{roberts00_icslp,
  author={William J.J. Roberts and Sadaoki Furui},
  title={{Robust speech recognition via modeling spectral coefficients with HMM's with complex Gaussian components}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 484-487},
  doi={10.21437/ICSLP.2000-854}
}