11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Semi-Supervised Part-of-Speech Tagging in Speech Applications

Richard Dufour, Benoit Favre

LIUM - University of Le Mans, France

When no training or adaptation data is available, semi-supervised training is a good alternative for processing new domains. We perform Bayesian training of a part-of-speech (POS) tagger from unannotated text and a dictionary of possible tags for each word. We extend that method with supervised prediction of possible tags for out-of-vocabulary words and study the impact of both semi-supervision and starting dictionary size on three representative downstream tasks (named entity tagging, semantic role labeling, ASR output post-processing) that use POS tags as features. The outcome is no impact or a small decrease in performance compared to using a fully supervised tagger, with even potential gains in case of domain mismatch for the supervised tagger. Tasks that trust the tags completely (like ASR post-processing) are more affected by a reduction of the starting dictionnary, but still yield positive outcome.

Full Paper

Bibliographic reference.  Dufour, Richard / Favre, Benoit (2010): "Semi-supervised part-of-speech tagging in speech applications", In INTERSPEECH-2010, 1373-1376.