Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Selective Training of HMMs by Using Two-Stage Clustering

Shoei Sato, Toru Imai, Hideki Tanaka, Akio Ando

NHK (Nippon Hoso Kyokai; Japan Broadcasting Corp.) Science and Technical Research Laboratories, Setagaya-ku, Tokyo, Japan

This paper proposes a method of constructing acoustic models from training data clustered in two stages. In the first stage, training data from a target task are clustered and generate GMMs for each cluster. The second stage uses the GMMs to select training data from a large-scale database based on the GMM likelihood. MAP estimation adapts an acoustic model for each cluster using the selected training data. In decoding, the best acoustic model is selected from all acoustic models based on the GMM likelihood using some initial frames of an input utterance. Broadcast news transcription experiments showed that the proposed models achieved a word error reduction of 20% and a processing time reduction of 22%, compared with a non-clustered model.

Full Paper

Bibliographic reference.  Sato, Shoei / Imai, Toru / Tanaka, Hideki / Ando, Akio (2000): "Selective training of HMMs by using two-stage clustering", In ICSLP-2000, vol.3, 726-729.