Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

MAT-2000 - Design, Collection, and Validation of a Mandarin 2000-Speaker Telephone Speech Database

Hsiao-Chuan Wang, Frank Seide (1), Chiu-Yu Tseng, Lin-Shan Lee

Association for Computational Linguistics and Chinese Language Processing, Taipei, Taiwan

Mandarin speech data Across Taiwan (MAT) is a project initiated by members of the Association for Computational Linguistics and Chinese Language Processing (ACLCLP) to collect speech data through public telephone networks in Taiwan. Totally over 7000 Taiwanese individuals have provided speech data. The results were released as a series of MAT speech databases to the research community in Taiwan. Two databases, MAT-160 and MAT-400, have been used for the first and second Assessment of Speech Recognition Technique in Taiwan. Now, release preparation of a larger database of over 2000 speakers, called MAT-2000, has been completed. In this joint project conducted by ACLCLP and Philips Research East-Asia, considerable effort has been spent on validating the database to ensure its quality. MAT-2000 consists of over 80 hours of recordings and contains about 640,000 Mandarin syllables in over 140,000 speech files. These speech files are grouped into five sub-databases for different application purposes.

Full Paper

Bibliographic reference.  Wang, Hsiao-Chuan / Seide, Frank / Tseng, Chiu-Yu / Lee, Lin-Shan (2000): "MAT-2000 - design, collection, and validation of a Mandarin 2000-speaker telephone speech database", In ICSLP-2000, vol.4, 460-463.