INTERSPEECH 2004 - ICSLP
This paper proposes a novel method for two-speaker audio segmentation, which creates a model for each speaker from the available data on the fly. This can be viewed as building a Hidden Markov Model(HMM) for the data with speakers abstracted as the hidden states. A clustering technique using Generalized Likelihood Ratio(GLR) metric, for good initialization of each Gaussian Mixture Model(GMM), such that each state corresponds to a single speaker and not noise, silence or word classes, is described. Finally, a refinement method, similar to Viterbi Training of HMMs is presented. The proposed method does not require prior knowledge of any speaker characteristics or tuning of threshold parameters, so it can be used with confidence over new data sets. The method results in a decrease in the error rate by 84.75% on the files reported in the baseline system. It performs well even with short speaker segments of 1s each.
Bibliographic reference. Gangadharaiah, Rashmi / Narayanaswamy, Balakrishnan / Balakrishnan, Narayanaswamy (2004): "A novel method for two-speaker segmentation", In INTERSPEECH-2004, 2337-2340.