10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Learning Lexicons from Spoken Utterances Based on Statistical Model Selection

Ryo Taguchi (1), Naoto Iwahashi (2), Takashi Nose (3), Kotaro Funakoshi (4), Mikio Nakano (4)

(1) ATR, Japan
(2) NICT, Japan
(3) Tokyo Institute of Technology, Japan
(4) Honda Research Institute Japan Co. Ltd., Japan

This paper proposes a method for the unsupervised learning of lexicons from pairs of a spoken utterance and an object as its meaning without any a priori linguistic knowledge other than a phoneme acoustic model. In order to obtain a lexicon, a statistical model of the joint probability of a spoken utterance and an object is learned based on the minimum description length principle. This model consists of a list of word phoneme sequences and three statistical models: the phoneme acoustic model, a word-bigram model, and a word meaning model. Experimental results show that the method can acquire acoustically, grammatically and semantically appropriate words with about 85% phoneme accuracy.

Full Paper

Bibliographic reference.  Taguchi, Ryo / Iwahashi, Naoto / Nose, Takashi / Funakoshi, Kotaro / Nakano, Mikio (2009): "Learning lexicons from spoken utterances based on statistical model selection", In INTERSPEECH-2009, 2731-2734.