4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Unknown-Multiple Signal Source Clustering Problem Using Ergodic HMM and Applied to Speaker Classification

J. Murakami (1), M. Sugiyama (2), H. Watanabe (3)

(1) NTT Information and Communication Systems Laboratories, Japan
(2) The University of Aizu, School of Computer Science and Engineering, Japan
(3) ATR Interpreting Telecommunication Research Laboratories, Kyoto, Japan

In this paper, we consider signals originated from a sequence of sources. More specifically, the problems of segmenting such signals and relating the segments to their sources are addressed. This issue has wide applications in many fields. This report describes a resolution method that is based on an Ergodic Hidden Markov Model (HMM), in which each HMM state corresponds to a signal source. The signal source sequence can be determined by using a decoding procedure (Viterbi algorithm or Forward algorithm) over the observed sequence. Baum-Welch training is used to estimate HMM parameters from the training material. As an example of the multiple signal source classification problem, an experiment is performed on unknown speaker classification. The results show a classification rate of 79% for 4 male speakers. The results also indicate that the model is sensitive to the initial values of the Ergodic HMM and that employing the long-distance LPC cepstrum is effective for signal preprocessing.

Full Paper

Bibliographic reference.  Murakami, J. / Sugiyama, M. / Watanabe, H. (1996): "Unknown-multiple signal source clustering problem using ergodic HMM and applied to speaker classification", In ICSLP-1996, 2407-2410.