Input level fusion and output level fusion methods are compared for fusing Mel-frequency Cepstral Coefficients with their corresponding delta coefficients. A 49 speaker subset of the King database is used under wideband and telephone conditions. The best input level fusion system is more computationally complex than the output level fusion system. Both input and output fusion systems were able to outperform the best purely MFCC based system for wideband data. For King telephone data, only the output level fusion based system was able to outperform the best purely MFCC based system. Further experiments using NIST'96 data under matched and mismatched conditions were also performed. Provided it was well tuned, we found that the output level fused system always outperformed the input level fused system under all experimental conditions.
Cite as: Slomka, S., Sridharan, S., Chandran, V. (1998) A comparison of fusion techniques in mel-cepstral based speaker identification. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0123, doi: 10.21437/ICSLP.1998-236
@inproceedings{slomka98_icslp, author={Stefan Slomka and Sridha Sridharan and Vinod Chandran}, title={{A comparison of fusion techniques in mel-cepstral based speaker identification}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0123}, doi={10.21437/ICSLP.1998-236} }