Freely available software packages for the training of Conditional Random Fields, e.g. CRF++, do not support longer n-grams than bigram, which can be attributed to the fact that training CRFs in original notation has a polynomial computation time dependence on the target vocabulary size and an exponential dependence on the n-gram size. We transfer the backing-off idea from language models to CRFs. We realized the software with Finite State Transducers, where we modified the calculation of the posterior algorithm. To implement the backing-off scheme, we applied Failure-transitions(ƒÓ) known from Open-FST. Proof of concept is given on the semantic tagging task MEDIA and on the grapheme-tophoneme (G2P) conversion tasks NETtalk and Celex, showing that computational time increases much below the size of the target vocabulary and showing error rate reduction on the G2P tasks.
Bibliographic reference. Lehnen, Patrick / Hahn, Stefan / Ney, Hermann (2011): "N-grams for conditional random fields or a failure-transition(ϕ) posterior for acyclic FSTs", In INTERSPEECH-2011, 1437-1440.