16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Study of Entity-Topic Models for OOV Proper Name Retrieval

Imran Sheikh, Irina Illina, Dominique Fohr

LORIA, France

Retrieving Proper Names (PNs) relevant to an audio document can improve speech recognition and content based audio-video indexing. Latent Dirichlet Allocation (LDA) topic model has been used to retrieve Out-Of-Vocabulary (OOV) PNs relevant to an audio document with good recall rates. However, retrieval of OOV PNs using LDA is affected by two issues, which we study in this paper: (1) Word Frequency Bias (less frequent OOV PNs are ranked lower); (2) Loss of Specificity (the reduced topic space representation loses lexical context). Entity-Topic models have been proposed as extensions of LDA to specifically learn relations between words, entities (PNs) and topics. We study OOV PN retrieval with Entity-Topic models and show that they are also affected by word frequency bias and loss of specificity. We evaluate our proposed methods for rare OOV PN re-ranking and lexical context re-ranking for LDA as well as for Entity-Topic models. The results show an improvement in both Recall and the Mean Average Precision.

Full Paper

Bibliographic reference.  Sheikh, Imran / Illina, Irina / Fohr, Dominique (2015): "Study of entity-topic models for OOV proper name retrieval", In INTERSPEECH-2015, 1344-1348.