ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

KPCatcher — a keyphrase extraction system for enterprise videos

Yongxin Taylor Xi, Matthias Paulik, Venkata Ramana Gadde, Ananth Sankar

This paper introduces KPCatcher (keyphrase catcher). The value of our work lies in providing concrete solutions to building a real keyphrase extraction product for enterprise videos. KPCatcher has been designed to robustly extract a ranked list of keyphrases from enterprise videos, independent of the domain. It treats noun phrases in the transcript as candidate keyphrases and scores them by aggregating word-level scores. By using confidence-based and counting-based rules, KPCatcher handles transcription errors to prevent incorrect keyphrases to be surfaced to end users. Different from previous work, we focus our experiments on automatic transcriptions of real enterprise videos from various domains. We thoroughly evaluate several well-known keyword ranking features and the denoising rules, using enterprise videos from several domains at various word error rates. We find term frequency to be the best feature and show that our denoising rules are very effective in both rejecting incorrect keyphrases and increasing the overlap between top keyphrases and human provided keyphrases. We also show that KPCatcher compares favorably to existing research systems on ICSI meeting data.


doi: 10.21437/Interspeech.2013-462

Cite as: Xi, Y.T., Paulik, M., Gadde, V.R., Sankar, A. (2013) KPCatcher — a keyphrase extraction system for enterprise videos. Proc. Interspeech 2013, 1906-1910, doi: 10.21437/Interspeech.2013-462

@inproceedings{xi13_interspeech,
  author={Yongxin Taylor Xi and Matthias Paulik and Venkata Ramana Gadde and Ananth Sankar},
  title={{KPCatcher — a keyphrase extraction system for enterprise videos}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1906--1910},
  doi={10.21437/Interspeech.2013-462}
}