INTERSPEECH 2004 - ICSLP
In this paper, we describe an initial stage to construct a multi-lingual speech corpus in Taiwan with selecting phonetically balanced scripts. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, constructing a multilingual phonetic alphabet, namely Formosa Phonetic Alphabet (ForPA), is the first step. In addition, the multilingual lexicons (Fomosa Lexicons) are also important parts for building the corpus. Recently, this corpus containing 2,300 speakers' speech database has been finished and is ready to be released. It contains about 200 hours of speech and 404,000 utterances.
Bibliographic reference. Liang, Min-siong / Lyu, Dau-cheng / Chiang, Yuang-chin / Lyu, Renyuan (2004): "Construct a multi-lingual speech corpus in taiwan with extracting phonetically balanced articles", In INTERSPEECH-2004, 2737-2740.