![]() |
7th International Conference on Spoken Language ProcessingSeptember 16-20, 2002 |
![]() |
For many practical applications of speech recognition systems, it is quite desirable to have an estimate of confidence for each hypothesized word. Unlike previous works on confidence measures, this paper studies features for confidence measures that are extracted from outputs of more than one LVCSR models. More specifically, this paper experimentally evaluates the agreement among the outputs of multiple Japanese LVCSR models, with respect to whether it is effective as an estimate of confidence for each hypothesized word. The results of experimental evaluation show that the agreement between the outputs with two LVCSR models with different decoders and acoustic models can achieve quite reliable confidence. Furthermore, among various features of acoustic models based on Gaussian mixture HMMs, it is concluded that ones such as whether or not to have short pause models, as well as different units in HMMs (e.g., triphone model or syllable model) are the most effective in achieving highly reliable confidence.
Bibliographic reference. Utsuro, Takehito / Harada, Tetsuji / Nishizaki, Hiromitsu / Nakagawa, Seiichi (2002): "A confidence measure based on agreement among multiple LVCSR models - correlation between pair of acoustic models and confidence", In ICSLP-2002, 701-704.