Sixth International Conference on Spoken Language Processing
Mandarin speech data Across Taiwan (MAT) is a project initiated by members of the Association for Computational Linguistics and Chinese Language Processing (ACLCLP) to collect speech data through public telephone networks in Taiwan. Totally over 7000 Taiwanese individuals have provided speech data. The results were released as a series of MAT speech databases to the research community in Taiwan. Two databases, MAT-160 and MAT-400, have been used for the first and second Assessment of Speech Recognition Technique in Taiwan. Now, release preparation of a larger database of over 2000 speakers, called MAT-2000, has been completed. In this joint project conducted by ACLCLP and Philips Research East-Asia, considerable effort has been spent on validating the database to ensure its quality. MAT-2000 consists of over 80 hours of recordings and contains about 640,000 Mandarin syllables in over 140,000 speech files. These speech files are grouped into five sub-databases for different application purposes.
Bibliographic reference. Wang, Hsiao-Chuan / Seide, Frank / Tseng, Chiu-Yu / Lee, Lin-Shan (2000): "MAT-2000 - design, collection, and validation of a Mandarin 2000-speaker telephone speech database", In ICSLP-2000, vol.4, 460-463.