INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Phone Boundary Detection using Selective Refinements and Context-dependent Acoustic Features

Sirinoot Boonsuk, Proadpran Punyabukkana, and Atiwong Suchato

Spoken Language Systems Research Group, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailan

Accurate placement of phone boundaries results in better performance of speech recognition systems as well as in the quality of concatenative speech synthesis. This study proposes a post-processing technique to refine the locations of phone boundaries provided by HMM-based forced alignment. The context-dependent Linear Discriminant Analysis (LDA) classifiers together with a confidence scoring scheme are utilized to improve the precision of locating phone boundaries. Every acoustic feature is not always suitable for locating boundaries between every type of phonetic segment. Therefore, feature selections are performed based on the boundary types. The proposed context-dependent refinement results in a 43.9% error reduction in locating phone boundaries compared to the ones obtained from an HMM-based force alignment. The average deviation, from manually labeled boundaries, is reduced from 1.4 to 1.0 frame when the frame size used is 10 milliseconds.

Full Paper

Bibliographic reference.  Sirinoot Boonsuk, Proadpran Punyabukkana, and Atiwong Suchato (2007): "Phone Boundary Detection using Selective Refinements and Context-dependent Acoustic Features", In INTERSPEECH-2007, 1362-1365.