12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Using Latent Topic Features for Named Entity Extraction in Search Queries

Joe Polifroni, François Mairesse

Nokia Research Center Cambridge, USA

Search is one of the most quickly growing applications in the mobile market. As people rely more on portable devices for performing search, it becomes increasingly important to analyze user queries in order to achieve more targetted results over a broad set of search entities. While most previous work has relied on lexico-syntactic features and handcrafted knowledge sources, this paper investigates methods for learning latent semantic features from unlabelled user-generated content. We extract word-topic associations by training a Latent Dirichlet Allocation model on a corpus of online reviews, and show that this information improves named-entity classification performance over broad domain search queries. We believe that topical features provide a rich source of information from data with minimal manual effort, and no dependency on a specific language.

Full Paper

Bibliographic reference.  Polifroni, Joe / Mairesse, François (2011): "Using latent topic features for named entity extraction in search queries", In INTERSPEECH-2011, 2129-2132.