In this paper, a two-stage sample-based phone boundary detection algorithm is proposed. In the first stage, some local sample-based acoustic parameters are used to pre-select some phone boundary candidates. Then, in the second stage, some high-order statistics of the log-likelihood differences of two adjacent speech segments around each boundary candidate are calculated to serve as similarity measure for candidate verification. Experimental results on the TIMIT speech corpus showed that EERs of 8.6% and 7.6% were achieved for one-stage and two-stage sample-based phone boundary detections, respectively. Moreover, for the two-stage system, 42.1% and 81.9% of boundaries detected were within 5- and 15-sample error tolerance from manual labeling results.
Bibliographic reference. Wang, Yih-Ru (2011): "A two-stage sample-based phone boundary detector using segmental similarity features", In INTERSPEECH-2011, 413-416.