ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications

Saurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty

Zero resource speech processing refers to a scenario where no or minimal transcribed data is available. In this paper, we propose a three-step unsupervised approach to zero resource speech processing, which does not require any other information/dataset. In the first step, we segment the speech signal into phoneme-like units, resulting in a large number of varying length segments. The second step involves clustering the varying-length segments into a finite number of clusters so that each segment can be labeled with a cluster index. The unsupervised transcriptions, thus obtained, can be thought of as a sequence of virtual phone labels. In the third step, a deep neural network classifier is trained to map the feature vectors extracted from the signal to its corresponding virtual phone label. The virtual phone posteriors extracted from the DNN are used as features in the zero resource speech processing. The effectiveness of the proposed approach is evaluated on both ABX and spoken term discovery tasks (STD) using spontaneous American English and Tsonga language datasets, provided as part of zero resource 2015 challenge. It is observed that the proposed system outperforms baselines, supplied along the datasets, in both the tasks without any task specific modifications.

doi: 10.21437/Interspeech.2017-1476

Cite as: Bhati, S., Nayak, S., Murty, K.S.R. (2017) Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications. Proc. Interspeech 2017, 2133-2137, doi: 10.21437/Interspeech.2017-1476

  author={Saurabhchand Bhati and Shekhar Nayak and K. Sri Rama Murty},
  title={{Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications}},
  booktitle={Proc. Interspeech 2017},