Sixth International Conference on Spoken Language Processing
This paper describes an algorithm for spoken language acquisition through a human-robot interface based on speech, vision, and behavior. In this algorithm the grounded language knowledge is represented by graphical statistical models consisting of hidden Markov models and stochastic context- free grammar. The learning of the lexicon is based on the independence between speech and visual features in each of lexical items. In the grammar-learning process, the syntactic structure of each spoken utterance is inferred from the conceptual structure extracted from the visual observation. The algorithm is robust against ambiguity and sparseness of learning data because it is based on information-theoretical learning.
Bibliographic reference. Iwahashi, Naoto (2000): "Language acquisition through a human-robot interface", In ICSLP-2000, vol.3, 442-447.