5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

An Algorithm for Automatic Generation of Mandarin Phonetic Balanced Corpus

Jyh-Shing Shyuu, Wang Jhing-Fa

Department of Computer Science and Information Engineering, Taiwan

This paper proposed an algorithm for automatic generation of Mandarin phonetic balanced corpus. The design of phonetic balanced corpus is particularly important for the collection of continuous speech database to reduce the co-articulate effects in continuous speech recognition(CSR). Traditionally, balanced corpus is generated manually or semi- automatically. Our proposed algorithm tries to find a minimum number of sentences from a large text corpus set and ensures that 408 Mandarin base syllables and 38*22 co-articulations between vowels and consonants are distributed in the extracted sentences.

Full Paper

Bibliographic reference.  Shyuu, Jyh-Shing / Jhing-Fa, Wang (1998): "An algorithm for automatic generation of Mandarin phonetic balanced corpus", In ICSLP-1998, paper 0960.