Sixth International Conference on Spoken Language Processing
An automatic transcription system has been developed to label and segment phonetic constituents of spontaneous American English without benefit of a word-level transcript. Instead, special-purpose neural networks classify each 10-ms frame of speech in terms of articulatory-acoustic-based phonetic features and the feature clusters are subsequently mapped to phonetic-segment labels using multilayer perceptron networks. The phonetic labels generated by this system are 80% concordant with the labels produced by human transcribers and the segmental boundaries deviate from manual segmentation by an average of 11 ms. The automatic transcription system thus generates phonetic labels and segmentation comparable in quality to those produced by human transcribers, and therefore may prove useful for phonetic annotation of novel linguistic corpora, as well as facilitating development of pronunciation models for automatic speech recognition systems.
Bibliographic reference. Chang, Shuangyu / Shastri, Lokendra / Greenberg, Steven (2000): "Automatic phonetic transcription of spontaneous speech (american English)", In ICSLP-2000, vol.4, 330-333.