In this paper, we propose a word confusion network (WCN) based approach to perform clustering of the spoken documents and analyze its ability to handle the influence of speech recognition errors. WCN compactly represents multiple confidence weighted recognition hypotheses. Thus it provides scope for improving the clustering accuracy as a result of the likely presence of the correct transcription in the alternative hypotheses for those cases where 1-best transcripts are erroneous. On the other hand, several of the remaining hypotheses are incorrect and hence could pose a challenge during the clustering. In our approach, we extract TF-IDF vectors from the WCNs to perform clustering using K-Means algorithm. The components of TF-IDF vectors are further weighted with the word posterior probabilities. This is to potentially down-weight those vector components that are contributed by the incorrect hypotheses of low posterior probabilities. The experimental results obtained using switchboard data illustrate the usefulness of rich information in the WCN for clustering, showing up to 4% absolute improvement in normalized mutual information metric.
Index Terms: spoken document clustering, word confusion network, posterior weighted TF-IDF vector, k-means clustering
Bibliographic reference. Ikbal, Shajith / Joshi, Sachindra / Verma, Ashish / Deshmukh, Om D. (2012): "Spoken document clustering using word confusion networks", In INTERSPEECH-2012, 1380-1383.