We propose a new method of incorporating the additional knowledge of accent, gender, and wide-context dependency information into ASR systems by utilizing the advantages of Bayesian networks. First, we only incorporate pentaphone-context dependency information. After that, accent and gender information are also integrated. In this method, we can easily extend conventional triphone HMMs to cover various sources of knowledge. The probabilistic dependencies between a triphone context unit and additional knowledge are learned through a BN. Another advantage is that during recognition, additional knowledge variables are assumed to be hidden, so that the existing standard triphone-based decoding system can be used without modification. The performance of the proposed model was evaluated on an LVCSR task using two different types of accented English speech data. Experimental results show that this proposed method improves word accuracy with respect to standard triphone models.
Cite as: Sakti, S., Markov, K., Nakamura, S. (2006) The use of Bayesian network for incorporating accent, gender and wide-context dependency information. Proc. Interspeech 2006, paper 1812-Wed1BuP.4, doi: 10.21437/Interspeech.2006-438
@inproceedings{sakti06_interspeech, author={Sakriani Sakti and Konstantin Markov and Satoshi Nakamura}, title={{The use of Bayesian network for incorporating accent, gender and wide-context dependency information}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1812-Wed1BuP.4}, doi={10.21437/Interspeech.2006-438} }