EUROSPEECH 2003 - INTERSPEECH 2003
Modeling pronunciation variation is key for recognizing conversational speech. Rather than being limited to dictionary modeling, we argue that triphone clustering is an integral part of pronunciation modeling. We propose a new approach called enhanced tree clustering. This approach, in contrast to traditional decision tree based state tying, allows parameter sharing across phonemes. We show that accurate pronunciation modeling can be achieved through efficient parameter sharing in the acoustic model. Combined with a single pronunciation dictionary, a 1.8% absolute word error rate improvement is achieved on Switchboard, a large vocabulary conversational speech recognition task.
Bibliographic reference. Yu, Hua / Schultz, Tanja (2003): "Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition", In EUROSPEECH-2003, 1869-1872.