We have been using a phone-based cepstral system with polynomial features in NIST evaluations for the past two years. This system uses three broad phone classes, three states per class, and third-order polynomial features obtained from MFCC features. In this paper, we present a complete analysis of the system. We start from a simpler system that does not use phones or states and show that the addition of phones gives a significant improvement. We show that adding state information does not provide improvement on its own but provides a significant improvement when used with phone classes. We complete the system by applying nuisance attribute projection (NAP) and score normalization. We show that splitting features after a joint NAP over all phone classes results in a significant improvement. Overall, we obtain about 25% performance improvement with polynomial features based on phones and states, and obtain a system with performance comparable to a state-of-the-art SVM system.
Bibliographic reference. Kajarekar, Sachin S. (2008): "Phone-based cepstral polynomial SVM system for speaker recognition", In INTERSPEECH-2008, 845-848.