Symposium on Machine Learning in Speech and Language Processing (MLSLP)
Bellevue, WA, USA
Almost all of NLP consists of the following three kinds of tasks: transforming information from one representation into another, identifying within a larger collection a fragment or subset obeying given desiderata, and assigning an appropriate label to a given exemplar. These tasks used to be performed by algorithms using manually crafted rules. The power, and the promise, of Machine Learning is to perform much of this work automatically. Over the past 20 years, NLP researchers have explored many kinds of learning algorithms on existing and/or easily created corpora for a wide range of tasks and phenomena. By using existing corpora or cleverly leveraging resources, they have avoided the difficult tasks of designing representations and building training data. But nothing is free forever. To make headway achieve increasingly high performance in various NLP tasks, we are going to need some deeper thinking about the phenomena themselves, about suitable representations for them, and about corpora that illustrate their complexity and scale. The core problem is that Machine Learning focuses on only half of the problem: it has nothing to say about the nature of the phenomena being addressed, and is not, ultimately, very useful for arriving at a deeper understanding of how language works. The best way forward for NLP, I believe, is to recognize that we need (at least) two kinds of researchers: the NLP linguists and the NLP engineers. In this talk I outline the problem and suggest ways in which ML researchers (who mostly are NLP engineers) can facilitate the work of NLP linguists. This effort will be repaid handsomely, since there remain many challenging problems in NLP, ready to be addressed once the linguistic perspectives and representations have been addressed.
Presentation (PowerPoint PPTX file)
Bibliographic reference. Hovy, Eduard (2011): "On the role of machine learning in NLP", In MLSLP-2011.