ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

An improved speech segmentation quality measure: the r-value

Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar

Phone segmentation in ASR is usually performed indirectly by Viterbi decoding of HMM output. Direct approaches also exist, e.g., blind speech segmentation algorithms. In either case, performance of automatic speech segmentation algorithms is often measured using automated evaluation algorithms and used to optimize a segmentation system’s performance. However, evaluation approaches reported in literature were found to be lacking. Also, we have determined that increases in phone boundary location detection rates are often due to increased over-segmentation levels and not to algorithmic improvements, i.e., by simply adding random boundaries a better hit-rate can be achieved when using current quality measures. Since established measures were found to be insensitive to this type of random boundary insertion, a new R-value quality measure is introduced that indicates how close a segmentation algorithm’s performance is to an ideal point of operation.

doi: 10.21437/Interspeech.2009-538

Cite as: Räsänen, O.J., Laine, U.K., Altosaar, T. (2009) An improved speech segmentation quality measure: the r-value. Proc. Interspeech 2009, 1851-1854, doi: 10.21437/Interspeech.2009-538

  author={Okko Johannes Räsänen and Unto Kalervo Laine and Toomas Altosaar},
  title={{An improved speech segmentation quality measure: the r-value}},
  booktitle={Proc. Interspeech 2009},