10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Multifactor Adaptation for Mandarin Broadcast News and Conversation Speech Recognition

Wen Wang, Arindam Mandal, Xin Lei, Andreas Stolcke, Jing Zheng

SRI International, USA

We explore the integration of multiple factors such as genre and speaker gender for acoustic model adaptation tasks to improve Mandarin ASR system performance on broadcast news and broadcast conversation audio. We investigate the use of multifactor clustering of acoustic model training data and the application of MPE-MAP and fMPE-MAP acoustic model adaptations. We found that by effectively combining these adaptation approaches, we achieve 6% relative reduction in recognition error rate compared to a Mandarin recognition system that does not use genre-specific acoustic models, and 5% relative improvement if the genre-adaptive system is combined with another, genre-independent state-of-theart system.

Full Paper

Bibliographic reference.  Wang, Wen / Mandal, Arindam / Lei, Xin / Stolcke, Andreas / Zheng, Jing (2009): "Multifactor adaptation for Mandarin broadcast news and conversation speech recognition", In INTERSPEECH-2009, 2103-2106.