Named Entity Recognition (NER) is an important task in information extraction, where major attention has been paid to written texts of a news or academic paper (esp. biomedical) style. In this paper we report the first piece of work on NER in spoken Chinese dialogues, as a preliminary step for spoken language understanding. The NER task is taken as a sequential classification problem and solved with a character-level maximum entropy (maxent) tagger. Despite that spoken data seems noisier than written data, with a set of carefully selected features, the maxent tagger achieves an overall F1 score of 91.87 on our dialogue data.
Bibliographic reference. Bao, Changchun / Xu, Weiqun / Yan, Yonghong (2008): "Recognizing named entities in spoken Chinese dialogues with a character-level maximum entropy tagger", In INTERSPEECH-2008, 1145-1148.