5th International Conference on Spoken Language Processing
This paper proposed an algorithm for automatic generation of Mandarin phonetic balanced corpus. The design of phonetic balanced corpus is particularly important for the collection of continuous speech database to reduce the co-articulate effects in continuous speech recognition(CSR). Traditionally, balanced corpus is generated manually or semi- automatically. Our proposed algorithm tries to find a minimum number of sentences from a large text corpus set and ensures that 408 Mandarin base syllables and 38*22 co-articulations between vowels and consonants are distributed in the extracted sentences.
Bibliographic reference. Shyuu, Jyh-Shing / Jhing-Fa, Wang (1998): "An algorithm for automatic generation of Mandarin phonetic balanced corpus", In ICSLP-1998, paper 0960.