ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units

V. Ramasubramanian, P. Srinivas, T. V. Sreenivas

We propose a stochastic pronunciation model using an ergodic - hidden Markov model (EHMM) of automatically derived acoustic sub-word units (SWU). The proposed EHMM discovers the pronunciation structure inherent in the acoustic training data of a word without any apriori phonetic transcriptions. The EHMM is an HMM of HMMs - its states are SWU HMMs and the state-transitions compose various possible lexicon. The EHMM parameters are estimated by an iterative segmental K-means procedure which jointly optimizes the subword units (states) and the pronunciation structure parameters (state-transitions). The EHMM based pronunciation model is evaluated in an English isolated word recognition task with 70 speakers drawn from 8 highly different first languages. Results show that EHMM learns the lexicon distribution over the population of speakers for each word, thereby effectively modeling the inter-speaker pronunciation variability. EHMM offers an improvement of 8% (absolute) word recognition accuracy over a single most likely lexicon performance.

doi: 10.21437/Interspeech.2005-492

Cite as: Ramasubramanian, V., Srinivas, P., Sreenivas, T.V. (2005) Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units. Proc. Interspeech 2005, 1361-1364, doi: 10.21437/Interspeech.2005-492

  author={V. Ramasubramanian and P. Srinivas and T. V. Sreenivas},
  title={{Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units}},
  booktitle={Proc. Interspeech 2005},