The advent of the information age has brought massive digital libraries of multimedia content. This development creates a high demand for information indexing and retrieval technologies, and the capability of browsing through audio archives is much desired. This paper reports on our initial attempt in the use of syllable units for Chinese spoken document retrieval. Our experiments are based on 1861 news stories from local television broadcasts in Cantonese, a monosyllabic Chinese dialect with a rich tonal structure. Results show that indexing with overlapping bi-syllables (tonal syllables) mapped from text delivers the reference retrieval performance at average inverse rank (AIR)=0.830. Retrieval based on overlapping bisyllables (base syllables) recognized from audio achieved an AIR of 0.460.
Cite as: Meng, H.M., Lo, W.K., Li, Y.C., Ching, P.C. (2000) Multi-scale audio indexing for Chinese spoken document retrieval. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 101-104, doi: 10.21437/ICSLP.2000-761
@inproceedings{meng00c_icslp, author={Helen M. Meng and W. K. Lo and Yuk Chi Li and P. C. Ching}, title={{Multi-scale audio indexing for Chinese spoken document retrieval}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 4, 101-104}, doi={10.21437/ICSLP.2000-761} }