Phone segmentation in ASR is usually performed indirectly by Viterbi decoding of HMM output. Direct approaches also exist, e.g., blind speech segmentation algorithms. In either case, performance of automatic speech segmentation algorithms is often measured using automated evaluation algorithms and used to optimize a segmentation systems performance. However, evaluation approaches reported in literature were found to be lacking. Also, we have determined that increases in phone boundary location detection rates are often due to increased over-segmentation levels and not to algorithmic improvements, i.e., by simply adding random boundaries a better hit-rate can be achieved when using current quality measures. Since established measures were found to be insensitive to this type of random boundary insertion, a new R-value quality measure is introduced that indicates how close a segmentation algorithms performance is to an ideal point of operation.
Cite as: Räsänen, O.J., Laine, U.K., Altosaar, T. (2009) An improved speech segmentation quality measure: the r-value. Proc. Interspeech 2009, 1851-1854, doi: 10.21437/Interspeech.2009-538
@inproceedings{rasanen09b_interspeech, author={Okko Johannes Räsänen and Unto Kalervo Laine and Toomas Altosaar}, title={{An improved speech segmentation quality measure: the r-value}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={1851--1854}, doi={10.21437/Interspeech.2009-538} }