Speaker recognition with clustering speech signals of the same speaker is an important speech analysis task in various applications. Recent works have shown that there was an underlying manifold on which speaker utterances live in the model-parameter space. However, most speaker clustering methods work on the Euclidean space, and hence often fail to discover the intrinsic geometrical structure of the data space. For this problem, we consider to convert the speaker i-vector representation of utterances in the Euclidean space into a network structure constructed based on the local (k) nearest neighbor relationship of these signals. We then propose a community detection model on the network for clustering signals. The new model is based on the probabilistic community memberships, and is further refined with the idea that: if two connected nodes have a high similarity, their community membership distributions in the model should be made close. This refinement enhances the local invariance assumption, and thus better respects the structure of the underlying manifold than the existing community detection methods. Some experiments are conducted on speaker content network built from a Chinese speaker recognition database. The results confirmed the effectiveness of this new method.
Bibliographic reference. Wang, Hongcui / Jin, Di / Li, Lantian / Dang, Jianwu (2015): "Community detection with manifold learning on speaker i-vector space for Chinese", In INTERSPEECH-2015, 3021-3025.