10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Word Confidence Using Duration Models

Stefano Scanzio (1), Pietro Laface (1), Daniele Colibro (2), Roberto Gemello (2)

(1) Politecnico di Torino, Italy
(2) Loquendo, Italy

In this paper, we propose a word confidence measure based on phone durations depending on large contexts. The measure is based on the expected duration of each recognized phone in a word. In the approach here proposed the duration of each phone is in principle context-dependent, and the measure is a function of the distance between the observed and expected phone duration distributions within a word. Our experiments show that, since the “duration confidence” does not make use of any acoustic information, its Equal Error Rate (EER) in terms of False Accept and False Rejection rates is not as good as the one obtained by using the more informed acoustic confidence measure. However, combining the two measures by a simple linear interpolation, the system EER improves by 6% to 10% relative on an isolated word recognition task in several languages.

Full Paper

Bibliographic reference.  Scanzio, Stefano / Laface, Pietro / Colibro, Daniele / Gemello, Roberto (2009): "Word confidence using duration models", In INTERSPEECH-2009, 1207-1210.