Sixth International Conference on Spoken Language Processing
This paper proposes a method of constructing acoustic models from training data clustered in two stages. In the first stage, training data from a target task are clustered and generate GMMs for each cluster. The second stage uses the GMMs to select training data from a large-scale database based on the GMM likelihood. MAP estimation adapts an acoustic model for each cluster using the selected training data. In decoding, the best acoustic model is selected from all acoustic models based on the GMM likelihood using some initial frames of an input utterance. Broadcast news transcription experiments showed that the proposed models achieved a word error reduction of 20% and a processing time reduction of 22%, compared with a non-clustered model.
Bibliographic reference. Sato, Shoei / Imai, Toru / Tanaka, Hideki / Ando, Akio (2000): "Selective training of HMMs by using two-stage clustering", In ICSLP-2000, vol.3, 726-729.