![]() |
INTERSPEECH 2011
|
![]() |
This paper presents effective triphone mapping for acoustic models training in automatic speech recognition, which allows the synthesis of unseen triphones. The description of this data-driven model clustering, including experiments performed using 350 hours of a Slovak audio database of mixed read and spontaneous speech, are presented. The proposed technique is compared with tree-based state tying, and it is shown that for bigger acoustic models, at a size of 4000 states and more, a triphone mapped HMM system achieves better performance than a tree-based state tying system. The main gain in performance is due to latent application of triphone mapping on monophones with multiple Gaussian pdfs, so the cloned triphones are initialized better than with single Gaussians monophones. Absolute decrease of word error rate was 0.46% (5.73% relatively) for models with 7500 states, and decreased to 0.4% (5.17% relatively) gain at 11500 states.
Bibliographic reference. Darjaa, Sakhia / Cerňak, Miloš / Trnka, Marián / Rusko, Milan / Sabo, Róbert (2011): "Effective triphone mapping for acoustic modeling in speech recognition", In INTERSPEECH-2011, 1717-1720.