Character-based embedding models provide robustness for handling misspellings and typos in natural language. In this paper, we explore convolutional neural network based embedding models for handling out-of-vocabulary words in a meal description food ranking task. We demonstrate that character-based models combined with a standard word-based model improves the top-5 recall of USDA database food items from 26.3% to 30.3% on a test set of all USDA foods with typos simulated in 10% of the data. We also propose a new reranking strategy for predicting the top USDA food matches given a meal description, which significantly outperforms our prior method of n-best decoding with a finite state transducer, improving the top-5 recall on the all USDA foods task from 20.7% to 63.8%.
Cite as: Korpusik, M., Collins, Z., Glass, J. (2017) Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions. Proc. Interspeech 2017, 3320-3324, doi: 10.21437/Interspeech.2017-422
@inproceedings{korpusik17_interspeech, author={Mandy Korpusik and Zachary Collins and James Glass}, title={{Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3320--3324}, doi={10.21437/Interspeech.2017-422} }