Given K utterances of a word and a set of sub-word units one may need a generalization of the conventional one-dimensional Viterbi algorithm to jointly decode them in order to derive their underlying word model (pronunciation). This extension is called k-dimensional Viterbi. However, as the number of utterances increases, the complexity of the k-dimensional Viterbi algorithm exponentially increases causing prohibitive computational burden. Here, we propose an approximation algorithm for the k-dimensional Viterbi which efficiently uses the available utterances to estimate the pronunciation. In addition to automatic dictionary generation, it can be used in computationally expensive applications such as lexicon-free training and joint pattern alignment.
Bibliographic reference. Naghibi, Tofigh / Hoffmann, Sarah / Pfister, Beat (2013): "An efficient method to estimate pronunciation from multiple utterances", In INTERSPEECH-2013, 1951-1955.