Accurate placement of phone boundaries results in better performance of speech recognition systems as well as in the quality of concatenative speech synthesis. This study proposes a post-processing technique to refine the locations of phone boundaries provided by HMM-based forced alignment. The context-dependent Linear Discriminant Analysis (LDA) classifiers together with a confidence scoring scheme are utilized to improve the precision of locating phone boundaries. Every acoustic feature is not always suitable for locating boundaries between every type of phonetic segment. Therefore, feature selections are performed based on the boundary types. The proposed context-dependent refinement results in a 43.9% error reduction in locating phone boundaries compared to the ones obtained from an HMM-based force alignment. The average deviation, from manually labeled boundaries, is reduced from 1.4 to 1.0 frame when the frame size used is 10 milliseconds.
Bibliographic reference. Sirinoot Boonsuk, Proadpran Punyabukkana, and Atiwong Suchato (2007): "Phone Boundary Detection using Selective Refinements and Context-dependent Acoustic Features", In INTERSPEECH-2007, 1362-1365.