11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Constructing Japanese Test Collections for Spoken Term Detection

Yoshiaki Itoh (1), Hiromitsu Nishizaki (2), Xinhui Hu (3), Hiroaki Nanjo (4), Tomoyosi Akiba (5), Tatsuya Kawahara (6), Seiichi Nakagawa (5), Tomoko Matsui (7), Yoichi Yamashita (8), Kiyoaki Aikawa (9)

(1) Iwate Prefectural University, Japan
(2) University of Yamanashi, Japan
(3) NICT, Japan
(4) Ryukoku University, Japan
(5) Toyohashi University of Technology, Japan
(6) Kyoto University, Japan
(7) Institute of Statistical Mathematics, Japan
(8) Ritsumeikan University, Japan
(9) Tokyo University of Technology, Japan

Spoken Document Retrieval (SDR) and Spoken Term Detection have been one of hottest topics in spoken document processing society. TREC (Text Retrieval Conference) has dealt with SDR from 1996 [1] and NIST has already set up STD test collections and collected the results of attendees [2]. For the Japanese spoken documents processing has also needed such test collections for SDR and STD. We set up a working group for this purpose in SIG-SLP (Spoken Language Processing) of Information Processing Society of Japan. The working group has constructed and offered a test collection for SDR [3]. We are now constructing new test collections for STD that is going to be open for researchers. The paper introduces the policy, the outline, and the schedule of new test collections. Some comparison is performed with the NIST STD tasks.

Full Paper

Bibliographic reference.  Itoh, Yoshiaki / Nishizaki, Hiromitsu / Hu, Xinhui / Nanjo, Hiroaki / Akiba, Tomoyosi / Kawahara, Tatsuya / Nakagawa, Seiichi / Matsui, Tomoko / Yamashita, Yoichi / Aikawa, Kiyoaki (2010): "Constructing Japanese test collections for spoken term detection", In INTERSPEECH-2010, 677-680.