Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
This paper extends our previous work on automatic language identification using 4 languages and high-quality speech, to automatic identification of 10 languages using telephone speech. The systems described here consist of two parts: (a) segmentation of telephone speech into seven broad phonetic categories and (b) classification of languages using feature measurements derived from the broad phonetic categories. Both the segmentation and classification stages use fully connected, feed-forward neural networks. When tested on new speakers from the 10 languages, the multi-language segmentation algorithm agrees with the handlabels 79.8% of the time. Classifiers were trained to identify (i) all 10 languages, (ii) each language vs. all others, (iii) the pairs English-Z, where L is one of the remaining 9 languages, and (iv) the triples English-L-OŁ/ier, where Other consists of the remaining 8 languages. Performance varied from 47.7% for the single 10-language network to 88.6% for the English-Tamil network. Classification performance of human listeners on short excerpts of speech is also reported.
Bibliographic reference. Muthusamy, Yeshwant K. / Cole, Ronald A. (1992): "Automatic segmentation and identification of ten languages using telephone speech", In ICSLP-1992, 1007-1010.