Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Speech Recognition without Grammar or Vocabulary Constraints

Harold Singer, Jun-ichi Takami

ATR Interpreting Telecommunications Res. Labs., Kyoto, Japan

Out-of-vocabulary words and ungrammatical utterances are two major problems in speech recognition. We believe that improving the acoustic model is essential in dealing with these problems. We propose to use a 'phonetic typewriter' as an evaluation method. Unlike common approaches, which evaluate acoustic and language model together, this allows direct evaluation of the acoustic model. A comparison of context-independent phone models based on continuous mixture HMM (20 mixtures per state) with context-dependent phone models based on HMnet[4] (3 mixtures per state) showed that phoneme error rate can be halved by using the latter models. The same 'phonetic typewriter' paradigm can also be used directly as a speech recognition method, in which speech is recognized as a string of phonemes without constraints on vocabulary or grammar. We show that over 97 % phoneme recognition accuracy can be achieved if our best acoustic model is used.

Full Paper

Bibliographic reference.  Singer, Harold / Takami, Jun-ichi (1994): "Speech recognition without grammar or vocabulary constraints", In ICSLP-1994, 2207-2210.