Third International Conference on Spoken Language Processing (ICSLP 94)
Out-of-vocabulary words and ungrammatical utterances are two major problems in speech recognition. We believe that improving the acoustic model is essential in dealing with these problems. We propose to use a 'phonetic typewriter' as an evaluation method. Unlike common approaches, which evaluate acoustic and language model together, this allows direct evaluation of the acoustic model. A comparison of context-independent phone models based on continuous mixture HMM (20 mixtures per state) with context-dependent phone models based on HMnet (3 mixtures per state) showed that phoneme error rate can be halved by using the latter models. The same 'phonetic typewriter' paradigm can also be used directly as a speech recognition method, in which speech is recognized as a string of phonemes without constraints on vocabulary or grammar. We show that over 97 % phoneme recognition accuracy can be achieved if our best acoustic model is used.
Bibliographic reference. Singer, Harold / Takami, Jun-ichi (1994): "Speech recognition without grammar or vocabulary constraints", In ICSLP-1994, 2207-2210.