Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Automatic Phonetic Transcription of Spontaneous Speech (American English)

Shuangyu Chang, Lokendra Shastri, Steven Greenberg

International Computer Science Institute, Berkeley, CA, USA

An automatic transcription system has been developed to label and segment phonetic constituents of spontaneous American English without benefit of a word-level transcript. Instead, special-purpose neural networks classify each 10-ms frame of speech in terms of articulatory-acoustic-based phonetic features and the feature clusters are subsequently mapped to phonetic-segment labels using multilayer perceptron networks. The phonetic labels generated by this system are 80% concordant with the labels produced by human transcribers and the segmental boundaries deviate from manual segmentation by an average of 11 ms. The automatic transcription system thus generates phonetic labels and segmentation comparable in quality to those produced by human transcribers, and therefore may prove useful for phonetic annotation of novel linguistic corpora, as well as facilitating development of pronunciation models for automatic speech recognition systems.

Full Paper

Bibliographic reference.  Chang, Shuangyu / Shastri, Lokendra / Greenberg, Steven (2000): "Automatic phonetic transcription of spontaneous speech (american English)", In ICSLP-2000, vol.4, 330-333.