14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Speech Acoustic Unit Segmentation Using Hierarchical Dirichlet Processes

Amir Hossein Harati Nejad Torbati, Joseph Picone, Marc Sobel

Temple University, USA

Speech recognition systems have historically used contextdependent phones as acoustic units because these units allow linguistic information, such as a pronunciation lexicon, to be leveraged. However, when dealing with a new language for which minimal linguistic resources exist, it is desirable to automatically discover acoustic units. The process of discovering acoustic units usually consists of two stages: segmentation and clustering. In this paper, we focus on the segmentation portion of this problem. We introduce a nonparametric Bayesian approach for segmentation, based on Hierarchical Dirichlet Processes (HDP), in which a hidden Markov model (HMM) with an unbounded number of states is used to segment the utterance. This model is referred to as an HDP-HMM. We compare this algorithm to several popular heuristic methods and demonstrate an 11% improvement in finding boundaries on the TIMIT Corpus. A self-similarity measure over segments shows an 88% improvement compared to manual segmentation with comparable segment length. This work represents the first step in the development of a speech recognition system that is entirely based on nonparametric Bayesian models.

Full Paper

Bibliographic reference.  Torbati, Amir Hossein Harati Nejad / Picone, Joseph / Sobel, Marc (2013): "Speech acoustic unit segmentation using hierarchical dirichlet processes", In INTERSPEECH-2013, 637-641.