The paper investigates the time- and acoustic-mediated alignment algorithms that can be used for better speech recognition evaluation. The edit-cost function, which weights the cost of speech unit matches, substitutions, deletions and insertions, is defined as a function of timed symbols or even as a function of speech signal segments. The algorithms are compared using several classical statistical measures of different types that are derived from speech recognition confusion matrices and are normally used to measure the agreement between different classifications of the same set of objects. These measures provide a reasonable indication that the investigated algorithms provide more relevant speech recognition error statistics than the algorithms that are commonly used for this purpose.
Bibliographic reference. Dobrišek, Simon / Mihelič, France (2011): "Time- and acoustic-mediated alignment algorithms for speech recognition evaluation", In INTERSPEECH-2011, 1517-1520.