In this paper, we describe a new HMM based multiphone method developed to reduce training data and training time as well as to restrain the effect of contextual holes. We define a multiphone as all possible phoneme combinations which consist of less than 4 phonemes. For the purpose of reducing the number of units, we train the most high-frequency-in-use multiphones instead of training all the multiphones. Recognition results are obtained by applying our method to the Japanese Common Speech Data Corpus. The results from the training-vocabulary show that our method achieves the same recognition accuracy as that of triphone HMM's. For the non-training vocabulary, we demonstrate that our method, compared to triphone method, reduces the error rate by as much as 70%. We also propose a two-stage search algorithm based on a pre-selection step and a detailed A* search. We show that compared to the Viterbi beam-search, the two-stage search algorithm just takes 70% of the computing time without reducing the recognition accuracy.
Cite as: Yi, J., Miki, K. (1992) A new method of speaker-independent speech recognition using multiphone HMM. Proc. 2nd International Conference on Spoken Language Processing (ICSLP 1992), 1471-1474, doi: 10.21437/ICSLP.1992-182
@inproceedings{yi92_icslp, author={Jie Yi and Kei Miki}, title={{A new method of speaker-independent speech recognition using multiphone HMM}}, year=1992, booktitle={Proc. 2nd International Conference on Spoken Language Processing (ICSLP 1992)}, pages={1471--1474}, doi={10.21437/ICSLP.1992-182} }