ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space

Seongmin Park, Jinkyu Seo, Jihwa Lee

We present HyperSeg, a hyperdimensional computing (HDC) approach to unsupervised dialogue topic segmentation. HDC is a class of vector symbolic architectures that leverages the probabilistic orthogonality of randomly drawn vectors at extremely high dimensions (typically over 10,000). HDC generates rich token representations through its low-cost initialization of many unrelated vectors. This is especially beneficial in topic segmentation, which often operates as a resource-constrained pre-processing step for downstream transcript understanding tasks. HyperSeg outperforms the current state-of-the-art in 4 out of 5 segmentation benchmarks -- even when baselines are given partial access to the ground truth -- and is 10 times faster on average. We show that HyperSeg also improves downstream summarization accuracy. With HyperSeg, we demonstrate the viability of HDC in a major language task. We open-source HyperSeg to provide a strong baseline for unsupervised topic segmentation.

doi: 10.21437/Interspeech.2023-1859

Cite as: Park, S., Seo, J., Lee, J. (2023) Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space. Proc. INTERSPEECH 2023, 730-734, doi: 10.21437/Interspeech.2023-1859

  author={Seongmin Park and Jinkyu Seo and Jihwa Lee},
  title={{Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space}},
  booktitle={Proc. INTERSPEECH 2023},