ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Leveraging locality for topic identification of conversational speech

Jonathan Wintrode

We evaluate the limitations of the bag-of-words assumption for topic identification of conversational discourse by examining whether topic-dependent word occurrence statistics are also position-independent. We demonstrate where the assumption is violated in conversational speech corpora and show how the relevance of words to the classification task decreases over the length of the document. We seek to improve topic identification by modeling this topic drift phenomenon and weight word counts according to a decay function over the length of the document. By applying a global decay rate for all words we observe reduction in error rates of 23.47% relative on conversational corpora. Furthermore, we apply a minimum classification error (MCE) training procedure to learn per-word decay rates, and reduce error rates by up to an additional 27%.


doi: 10.21437/Interspeech.2013-398

Cite as: Wintrode, J. (2013) Leveraging locality for topic identification of conversational speech. Proc. Interspeech 2013, 1579-1583, doi: 10.21437/Interspeech.2013-398

@inproceedings{wintrode13_interspeech,
  author={Jonathan Wintrode},
  title={{Leveraging locality for topic identification of conversational speech}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1579--1583},
  doi={10.21437/Interspeech.2013-398}
}