Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
The OGI Multi-language Telephone Speech Corpus is designed to support research on automatic language identification and multi-language speech recognition. The corpus consists of up to nine separate responses from each caller, ranging from single words to short topic-specific descriptions to 60 seconds of unconstrained spontaneous speech. The utterances were spoken over commercial telephone lines by speakers of English, Farsi (Persian), French, German, Japanese, Korean, Mandarin Chi- nese, Spanish, Tamil, and Vietnamese. We have completed the initial phase of our data acquisition effort: the recording and initial verification of utterances produced by 100 different speakers in each of the 10 languages. We describe the recording protocol, data collection procedure, ongoing corpus development, prelim- inary results of the statistical evaluation of the 10 languages, and plans to provide orthographic transcriptions of the speech.
Bibliographic reference. Muthusamy, Yeshwant K. / Cole, Ronald A. / Oshika, Beatrice T. (1992): "The OGI multi-language telephone speech corpus", In ICSLP-1992, 895-898.