IBM's submission for the Phase II speaker recognition evaluation of the DARPA sponsored Robust Automatic Transcription of Speech (RATS) program is examined. The objectives of the paper are three fold: (1) to provide a system description, (2) to identify key techniques for performance improvement, and (3) to quantify their contribution. In the system design, the fundamental idea revolves around exploiting diversity and modeling complementary information at all levels. To speed up system development a push-button system is designed whereby all system development steps could be rapidly completed. Noise robustness is improved by utilizing two speech activity detectors (SADs) and five acoustic feature extractors. Furthermore, the probabilistic linear discriminant analysis (PLDA) based back-ends were trained with two different data subsets. To better exploit the complementary information, system combination was performed in two modules. The first module trained new PLDA back-ends from concatenated compact representations while the second combined all the system scores and duration related side information in a neural network. The official results from the Phase II evaluation are also examined. The results indicate that for the 30s-30s task the performance of the overall system was better than the best single system by 46% and 40% on the internal and evaluation test sets respectively.
Bibliographic reference. Zhu, Weizhong / Yaman, Sibel / Pelecanos, Jason (2013): "The IBM RATS phase II speaker recognition system: overview and analysis", In INTERSPEECH-2013, 3137-3141.