This paper reports on two methods aimed at achieving robustness for Cantonese spoken document retrieval. Our experimental corpus contains 60 hours of Cantonese television news broadcasts with over 1600 news stories. These spoken documents are indexed by automatic speech recognition of Cantonese base syllables. Recognition performance degrades significantly as we migrate from anchor speech recorded in the studio to reporter/interviewee speech recorded in the field. Recognition errors affect retrieval performance. We devised two robust methods to reduce the adverse effects of speech recognition errors on retrieval: (1) developing techniques to automatically extract studio speech from the audio tracks and using only these in retrieval; and (2) using N-best recognition hypotheses for document expansion prior to retrieval. Results indicate that (i) the best method to automatically extract studio speech segments fuses audio-based segmentation with video-based segmentation; (ii) using only the studio speech segments for our known-item retrieval task may not necessarily bring about better retrieval performance since we are discarding approximately three quarters of the audio in our corpus; (iii) the use of N-best recognition hypothesis for document expansion can bring about further improvements in retrieval performance, attaining an average inverse rank of 0.654.
Cite as: Hui, P.Y., Lo, W.K., Meng, H.M. (2003) Two robust methods for cantonese spoken document retrieval. Proc. ISCA Workshop on Multilingual Spoken Document Retrieval (MSDR 2003), 7-12
@inproceedings{hui03_msdr, author={Pui Yu Hui and Wai Kit Lo and Helen M. Meng}, title={{Two robust methods for cantonese spoken document retrieval}}, year=2003, booktitle={Proc. ISCA Workshop on Multilingual Spoken Document Retrieval (MSDR 2003)}, pages={7--12} }