In the speaker recognition, when the cepstral coefficients are calculated from the LPC analysis parameters, the LPC residual and pitch are usually ignored. This paper describes an approach to integrate the pitch and LPC-residual with the LPC-cepstrum in a Gaussian Mixture Model based speaker recognition system. The pitch and LPC-residual are represented as a logarithm of the F0 and as a MFCC vector respectively. The second task of this research is to verify whether the correlation between the different information sources is useful for the speaker recognition task. The results showed that adding the pitch gives significant improvement only when the correlation between the pitch and cepstral coefficients is used. Adding only LPC-residual also gives significant improvement, but using the correlation with the cepstral coefficients does not have big effect. The best achieved results are 98.5% speaker identification rate and 0.21% speaker verification equal error rate compared to 97.0% and 1.07% of the baseline system, respectively.
Cite as: Markov, K.P., Nakagawa, S. (1998) Text-independent speaker recognition using multiple information sources. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0744, doi: 10.21437/ICSLP.1998-223
@inproceedings{markov98_icslp, author={Konstantin P. Markov and Seiichi Nakagawa}, title={{Text-independent speaker recognition using multiple information sources}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0744}, doi={10.21437/ICSLP.1998-223} }