International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Towards Retrieval of Video Archives Based on the Speech Content

Mei-Fang Huang, Kuan-Ting Chen, Hsin-Min Wang

Academia Sinica, Taipei, Taiwan

Huge collections of video and audio recordings which have captured events of the last century remain an untapped resource of historical value. Accordingly, there are many digital library projects worldwide studying how multimedia digital libraries can be established and used. In this paper, we will report on some interesting findings from our recent work towards retrieval of video archives for Taiwan’s humanity and social activities based on the speech content. We are currently focusing on the recordings about the aboriginals in Taiwan. Based on the acoustic models trained by broadcast news speech and language models trained by newswire texts, the recognition accuracy, which is 15.92% for syllables and 8.18% for characters, is disappointedly low. After applying the model adaptation techniques using some domain-specific training speech and text corpora, we are able to improve the accuracies to 30.04% and 22.08%, respectively. Though the accuracies are definitely not satisfactory, we found that it is still feasible to build a speech retrieval system for the target video archives.


Full Paper

Bibliographic reference.  Huang, Mei-Fang / Chen, Kuan-Ting / Wang, Hsin-Min (2002): "Towards retrieval of video archives based on the speech content", In ISCSLP 2002, paper 53.