EUROSPEECH 2001 Scandinavia
Extending the vocabulary of a large vocabulary speech recognition system usually requires phonetic transcriptions for all words to be known. With automatic phonetic baseform determination acoustic samples of the words in question can substitute for the required expert knowledge. In this paper we follow a probabilitistic approach to this problem and present a novel breadth-first search algorithm which takes full advantage of multiple samples. An extension to the algorithm to genereate phone graphs as well as an EM based iteration scheme for estimating stochastic pronunciation models is presented. In preliminary experiments phoneme error rates below 5% with respect to the standard pronunciation are achieved without language or word specific prior knowledge.
Bibliographic reference. Bisani, M. / Ney, Hermann (2001): "Breadth-first search for finding the optimal phonetic transcription from multiple utterances", In EUROSPEECH-2001, 1429-1432.