We propose an unsupervised learning algorithm that learns hierarchical patterns of word sequences in spoken language utterances. It extracts cluster rules from training data based on high n-gram language model probabilities to cluster words or segment a sentence. Cluster trees, similar to parse trees, are constructed from the learned cluster rules. This hierarchical clustering adds grammatical structure onto a traditional trigram language model. The learned cluster rules are used to rescore and improve the n-best utterance hypothesis list which is output by a speech recognizer based on acoustic and trigram language model scores. Our hierarchical cluster language model was trained on TREC broadcast news data from 1995 and 1996, and reduced word error rate on the HUB-4 1997 broadcast news development set by 0.3% absolute. Prior symbolic knowledge in the form of rules can also be incorporated by simply applying the rules to the training data before the applicable learning iteration.
Cite as: Jang, P.J., Hauptmann, A.G. (1998) Hierarchical cluster language modeling with statistical rule extraction for rescoring n-best hypotheses during speech decoding. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0934, doi: 10.21437/ICSLP.1998-647
@inproceedings{jang98b_icslp, author={Photina Jaeyoun Jang and Alexander G. Hauptmann}, title={{Hierarchical cluster language modeling with statistical rule extraction for rescoring n-best hypotheses during speech decoding}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0934}, doi={10.21437/ICSLP.1998-647} }