11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

CRF-Based Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection

Dong Wang (1), Simon King (1), Nicholas Evans (2), Raphaël Troncy (2)

(1) University of Edinburgh, UK
(2) EURECOM, France

Out-of-vocabulary (OOV) terms present a significant challenge to spoken term detection (STD). This challenge, to a large extent, lies in the high degree of uncertainty in pronunciations of OOV terms. In previous work, we presented a stochastic pronunciation modeling (SPM) approach to compensate for this uncertainty. A shortcoming of our original work, however, is that the SPM was based on a joint-multigram model (JMM), which is suboptimal. In this paper, we propose to use conditional random fields (CRFs) for letter-to-sound conversion, which significantly improves quality of the predicted pronunciations. When applied to OOV STD, we achieve considerable performance improvement with both a 1-best system and an SPM-based system.

Full Paper

Bibliographic reference.  Wang, Dong / King, Simon / Evans, Nicholas / Troncy, Raphaël (2010): "CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection", In INTERSPEECH-2010, 1668-1671.