This paper examines the performance of dynamic features in automatic speaker verification. In particular we consider feature combinations at the vector level, combining static with velocity and acceleration representations derived from regression analysis. Previous work has shown that these sets can be usefully combined at the mode/level, with a separate model for each feature type. However, in the case of a multi-layer perceptron (MLP) classifier, the vector level of combination might be more appropriate. We consider both word-serial and word-parallel inputs to the MLP, showing that the former is generally better. While the MLP is shown to give slightly better results than a vector quantization codebook approach, the inclusion of the dynamic features does not result in a significant improvement in performance. This is attributed to limitations in the net training.
Bibliographic reference. Zhang, X. / Mason, J. S. / Andrews, E. C. (1991): "Multiple dynamic features to enhance neural net based speaker verification", In EUROSPEECH-1991, 1411-1414.