INTERSPEECH 2013

Given K utterances of a word and a set of subword units one may need a generalization of the conventional onedimensional Viterbi algorithm to jointly decode them in order to derive their underlying word model (pronunciation). This extension is called kdimensional Viterbi. However, as the number of utterances increases, the complexity of the kdimensional Viterbi algorithm exponentially increases causing prohibitive computational burden. Here, we propose an approximation algorithm for the kdimensional Viterbi which efficiently uses the available utterances to estimate the pronunciation. In addition to automatic dictionary generation, it can be used in computationally expensive applications such as lexiconfree training and joint pattern alignment.
Bibliographic reference. Naghibi, Tofigh / Hoffmann, Sarah / Pfister, Beat (2013): "An efficient method to estimate pronunciation from multiple utterances", In INTERSPEECH2013, 19511955.