7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper presents a method for speaker-independent automatic phonetic alignment that is distinguished from standard HMM-based "forced alignment" in three respects: (1) specific acoustic-phonetic features are used, in addition to PLP features, by the phonetic classifier; (2) the units of classification consist of distinctive phonetic features instead of phonemes; and (3) observation probabilities depend not only on the current state, but also on the state transition information. This proposed method is compared with a state-of-the-art baseline forced-alignment system on a number of corpora, including telephone speech, microphone speech, and children’s speech. The new method has agreement of 92.57% within 20 msec on the TIMIT corpus, which is a 26% reduction in error over the baseline method (with 89.95% agreement on TIMIT). Average reduction in error over all corpora is 28%.
Bibliographic reference. Hosom, John-Paul (2002): "Automatic phoneme alignment based on acoustic-phonetic modeling", In ICSLP-2002, 357-360.