In a maximum a posteriori probability approach to speech recognition stochastic n-gram language models are used for the estimation of a word sequence's a priori probability. In any practical implementation of a large vocabulary speech recognition system the language model acts as a hypotheses filter that has to differ between candidate words with similar acoustic evidence. For that purpose, the combination of word based and class based language models is attractive, because it allows to fall back to the more reliable estimates of the class based model in case of sparse training data. However, class language models can differ between words from the same class only in terms of a priori probability. To improve the discriminative power for words with similar acoustic score, it is therefore useful to put similar sounding words into different classes.
Based on the above considerations, the paper presents an automatic procedure for the optimal classification of a large vocabulary into classes with acoustic dissimilar words. Jn combination with a standard word based trigram model the so created acoustic class language model provides a relative reduction in word error rate of up to 16 percent and performs slightly better than a perplexity minimizing automatically created class language model.
Cite as: Fischer, V., Kunzmann, S.J. (2000) Acoustic language model classes for a large vocabulary continuous speech recognizer. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 810-813, doi: 10.21437/ICSLP.2000-658
@inproceedings{fischer00_icslp, author={Volker Fischer and S. J. Kunzmann}, title={{Acoustic language model classes for a large vocabulary continuous speech recognizer}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 3, 810-813}, doi={10.21437/ICSLP.2000-658} }