INTERSPEECH 2006 - ICSLP
Ability to automatically align phonetic transcriptions with their associated acoustic signal is crucial to the development of computer-assisted speech training system where, it is frequently needed to locate phone boundaries from a known transcription in the signal. In this paper, an attempt to locate phone boundaries in the case where only the numbers of boundaries are known is described. Phone boundaries are hypothesized based solely on acoustic discontinuities without knowing exact transcriptions. This allows speech segmentation to be performed without large number of in-domain speech data for training. The boundary identification is done in two stages. Candidates for possible boundaries are selected in the first stage via local maxima of spectral changes. Dynamic programming is used to search for the best locations of phone boundaries from the candidate list. Allowing at most 20 ms. deviation from the actual boundaries, approximately 75% accuracy is achieved on a Thai continuous speech corpus.
Bibliographic reference. Leelaphattarakij, Pairote / Punyabukkana, Proadpran / Suchato, Atiwong (2006): "Locating phone boundaries from acoustic discontinuities using a two-staged approach", In INTERSPEECH-2006, paper 1734-Mon3CaP.10.