Language and channel variations are two important concerns currently affecting practical automatic language and speaker recognition performance. To address these challenges, a corpus of speech was collected from 100 bilingual speakers in each of three foreign languages (Arabic-English, Korean-English, and Spanish-English). The recordings were made in highly controlled conditions using multiple microphones simultaneously, each with different measured response characteristics. The speakers were asked to perform a set of speaking tasks including conversations, text independent readings, and prescribed text readings. These tasks were performed in English and in each speakers native language. The equipment, the recording procedures, and the data formats are presented, along with a preliminary analysis of recorded signal quality.
Cite as: Beck, S.D., Schwartz, R., Nakasone, H. (2004) A bilingual multi-modal voice corpus for language and speaker recognition (LASR) services. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 265-270
@inproceedings{beck04_odyssey, author={Steven D. Beck and Reva Schwartz and Hirotaka Nakasone}, title={{A bilingual multi-modal voice corpus for language and speaker recognition (LASR) services}}, year=2004, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2004)}, pages={265--270} }