ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Techniques for accurate automatic annotation of speech waveforms

Stephen Cox, Richard Brady, Peter Jackson

We describe techniques used in the development of a high-accuracy automatic annotation system designed to provide new voices for a concatenative speech synthesiser. We have used standard HMM-based "forced alignment" techniques and have concentrated on refining both acoustic and pronunciation modelling to achieve greater alignment accuracy. Acoustic models were improved by Bayesian speaker adaptation and the use of confidence measures from N-Best decodings to produce speaker dependent HMM's. Pronunciation modelling improvements involved the use of a large pronunciation dictionary containing multiple pronunciations for many words, use of pronunciation probabilities, accommodation of interword silences and using information derived from existing manual annotations to guide the recogniser during decoding. The system produces time-aligned phonetic alignments for UK accents in which the automatic and manual alignments agree on the segmental labelling 93% of the time and in which the boundaries have an r.m.s. error of 14.5 ms from the manual boundary.


doi: 10.21437/ICSLP.1998-22

Cite as: Cox, S., Brady, R., Jackson, P. (1998) Techniques for accurate automatic annotation of speech waveforms. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0466, doi: 10.21437/ICSLP.1998-22

@inproceedings{cox98_icslp,
  author={Stephen Cox and Richard Brady and Peter Jackson},
  title={{Techniques for accurate automatic annotation of speech waveforms}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0466},
  doi={10.21437/ICSLP.1998-22}
}