INTERSPEECH 2015
16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Neural Higher-Order Factors in Conditional Random Fields for Phoneme Classification

Martin Ratajczak (1), Sebastian Tschiatschek (2), Franz Pernkopf (1)

(1) Technische Universität Graz, Austria
(2) ETH Zürich, Switzerland

We explore neural higher-order input-dependent factors in linear-chain conditional random fields (LC-CRFs) for sequence labeling. Higher-order LC-CRFs with linear factors are well-established for sequence labeling tasks, but they lack the ability to model non-linear dependencies. These non-linear dependencies, however, can be efficiently modelled by neural higher-order input-dependent factors which map sub-sequences of inputs to sub-sequences of outputs. This mapping is important in many tasks, in particular, for phoneme classification where the phone representations strongly depend on the context phonemes. Experimental results for phoneme classification with LC-CRFs and neural higher-order factors confirm this fact and we achieve the best ever reported phoneme classification performance on TIMIT, i.e. a phoneme error rate of 15.8%. Furthermore, we show that the success is not obvious as linear high-order factors degrade phoneme classification performance on TIMIT.

Full Paper

Bibliographic reference.  Ratajczak, Martin / Tschiatschek, Sebastian / Pernkopf, Franz (2015): "Neural higher-order factors in conditional random fields for phoneme classification", In INTERSPEECH-2015, 2137-2141.