14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

An Efficient Method to Estimate Pronunciation from Multiple Utterances

Tofigh Naghibi, Sarah Hoffmann, Beat Pfister

ETH Zürich, Switzerland

Given K utterances of a word and a set of sub-word units one may need a generalization of the conventional one-dimensional Viterbi algorithm to jointly decode them in order to derive their underlying word model (pronunciation). This extension is called k-dimensional Viterbi. However, as the number of utterances increases, the complexity of the k-dimensional Viterbi algorithm exponentially increases causing prohibitive computational burden. Here, we propose an approximation algorithm for the k-dimensional Viterbi which efficiently uses the available utterances to estimate the pronunciation. In addition to automatic dictionary generation, it can be used in computationally expensive applications such as lexicon-free training and joint pattern alignment.

Full Paper

Bibliographic reference.  Naghibi, Tofigh / Hoffmann, Sarah / Pfister, Beat (2013): "An efficient method to estimate pronunciation from multiple utterances", In INTERSPEECH-2013, 1951-1955.