Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Using Machine Learning Method and Subword Unit Representations for Spoken Document Categorization

Weidong Qu, Katsuhiko Shirai

Department of Information and Computer Science, Waseda University, Tokyo, Japan

In this paper, we investigate the feasibility of using machine learning method and subword units for spoken document categorization as an alternative to using words generated by word recognition or keyword spotting. An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken document and could attack the out of vocabulary (OOV) problem. The context-sensitive learning method is efficient on large, noisy corpora and very suitable for subword-based categorization. Given that even the best phone recognizers make a large number of mistakes, to improve phone N-gram recall, we can once again use phone lattices to obtain the bag of phone N-grams for each speech document. In this study, we examine a variety of subword unit categorization terms and measure their ability to perform effective categorization work, and also have investigated the performance when the underlying phonetic transcriptions contain different recognition errors.


Full Paper

Bibliographic reference.  Qu, Weidong / Shirai, Katsuhiko (2000): "Using machine learning method and subword unit representations for spoken document categorization", In ICSLP-2000, vol.3, 1065-1068.