ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Learning lexicons from spoken utterances based on statistical model selection

Ryo Taguchi, Naoto Iwahashi, Takashi Nose, Kotaro Funakoshi, Mikio Nakano

This paper proposes a method for the unsupervised learning of lexicons from pairs of a spoken utterance and an object as its meaning without any a priori linguistic knowledge other than a phoneme acoustic model. In order to obtain a lexicon, a statistical model of the joint probability of a spoken utterance and an object is learned based on the minimum description length principle. This model consists of a list of word phoneme sequences and three statistical models: the phoneme acoustic model, a word-bigram model, and a word meaning model. Experimental results show that the method can acquire acoustically, grammatically and semantically appropriate words with about 85% phoneme accuracy.


doi: 10.21437/Interspeech.2009-698

Cite as: Taguchi, R., Iwahashi, N., Nose, T., Funakoshi, K., Nakano, M. (2009) Learning lexicons from spoken utterances based on statistical model selection. Proc. Interspeech 2009, 2731-2734, doi: 10.21437/Interspeech.2009-698

@inproceedings{taguchi09_interspeech,
  author={Ryo Taguchi and Naoto Iwahashi and Takashi Nose and Kotaro Funakoshi and Mikio Nakano},
  title={{Learning lexicons from spoken utterances based on statistical model selection}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2731--2734},
  doi={10.21437/Interspeech.2009-698}
}