INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Discriminatively Learning Factorized Finite State Pronunciation Models from Dynamic Bayesian Networks

Preethi Jyothi (1), Eric Fosler-Lussier (1), Karen Livescu (2)

(1) Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
(2) Toyota Techological Institute at Chicago, Chicago, IL, USA

This paper describes an approach to efficiently derive, and discriminatively train, a weighted finite state transducer (WFST) representation for an articulatory feature-based model of pronunciation. This model is originally implemented as a dynamic Bayesian network (DBN). The work is motivated by a desire to (1) incorporate such a pronunciation model in WFST-based recognizers, and to (2) learn discriminative models that are more general than the DBNs. The approach is quite general, though here we show how it applies to a specific model. We use the conditional independence assumptions imposed by the DBN to efficiently convert it into a sequence of WFSTs (factor FSTs) which, when composed, yield the same model as the DBN. We then introduce a linear model of the arc weights of the factor FSTs and discriminatively learn its weights using the averaged perceptron algorithm. We demonstrate the approach using a lexical access task in which we recognize a word given its surface realization. Our experimental results using a phonetically transcribed subset of the Switchboard corpus show that the discriminatively learned model performs significantly better than the original DBN.

Index Terms: articulatory features, discriminative training, finite state transducers, dynamic Bayesian networks

Full Paper

Bibliographic reference.  Jyothi, Preethi / Fosler-Lussier, Eric / Livescu, Karen (2012): "Discriminatively learning factorized finite state pronunciation models from dynamic Bayesian networks", In INTERSPEECH-2012, 1063-1066.