7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper describes recent work towards development of new corpora and tools for Turkish speech research. This effort represents an on-going collaboration between the Center for Spoken Language Research (CSLR) at the University of Colorado and the Department of Electrical Engineering at the Middle East Technical University (METU). A new text corpus developed from Turkish newspapers text is described. In addition, a 193-speaker audio corpus and pronunciation lexicon for the Turkish language is developed. We then describe our initial work towards porting Sonic, the CSLR speech recognition system, to the Turkish language. Results are shown for phonetic alignment and phoneme recognition accuracy using the newly constructed corpus and speech tools. It is shown that 91.2% of the automatically labeled phoneme boundaries are placed within 20 msec of hand-labeled locations for the Turkish audio corpus. Finally, a phoneme recognition error rate of 29.3% is demonstrated.
Bibliographic reference. Salor, Özgül / Pellom, Bryan / Çiloglu, Tolga / Hacioglu, Kadri / Demirekler, Mübeccel (2002): "On developing new text and audio corpora and speech recognition tools for the turkish language", In ICSLP-2002, 349-352.