We explore the integration of multiple factors such as genre and speaker gender for acoustic model adaptation tasks to improve Mandarin ASR system performance on broadcast news and broadcast conversation audio. We investigate the use of multifactor clustering of acoustic model training data and the application of MPE-MAP and fMPE-MAP acoustic model adaptations. We found that by effectively combining these adaptation approaches, we achieve 6% relative reduction in recognition error rate compared to a Mandarin recognition system that does not use genre-specific acoustic models, and 5% relative improvement if the genre-adaptive system is combined with another, genre-independent state-of-theart system.
Bibliographic reference. Wang, Wen / Mandal, Arindam / Lei, Xin / Stolcke, Andreas / Zheng, Jing (2009): "Multifactor adaptation for Mandarin broadcast news and conversation speech recognition", In INTERSPEECH-2009, 2103-2106.