11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Discovering an Optimal Set of Minimally Contrasting Acoustic Speech Units: A Point of Focus for Whole-Word Pattern Matching

Guillaume Aimetti (1), Roger K. Moore (1), Louis ten Bosch (2)

(1) University of Sheffield, UK
(2) Radboud Universiteit Nijmegen, The Netherlands

This paper presents a computational model that can automatically learn words, made up from emergent sub-word units, with no prior linguistic knowledge. This research is inspired by current cognitive theories of human speech perception, and therefore strives for ecological plausibility with the desire to build more robust speech recognition technology. Firstly, the particulate structure of the raw acoustic speech signal is derived through a novel acoustic segmentation process, the `acoustic DP-ngram algorithm'. Then, using a cross-modal association learning mechanism, word models are derived as a sequence of the segmented units. An efficient set of sub-word units emerge as a result of a general purpose lossy compression mechanism and the algorithms sensitivity to discriminate acoustic differences. The results show that the system can automatically derive robust word representations and dynamically build re-usable sub-word acoustic units with no pre-defined language-specific rules.

Full Paper

Bibliographic reference.  Aimetti, Guillaume / Moore, Roger K. / Bosch, Louis ten (2010): "Discovering an optimal set of minimally contrasting acoustic speech units: a point of focus for whole-word pattern matching", In INTERSPEECH-2010, 310-313.