INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Estimating the Potential of Signal and Interlocutor-Track Information for Language Modeling

Nigel G. Ward, Benjamin H. Walker

University of Texas at El Paso, USA

Although today most language models treat language purely as word sequences, there is recurring interest in tapping new sources of information, such as disfluencies, prosody, the interlocutorfs dialog act, and the interlocutor's recent words. In order to estimate the potential value of such sources of information, we extend Shannon's guessing-game method for estimating entropy to work for spoken dialog. Four teams of two subjects each predicted the next word in a dialog using various amounts of context: one word, two words, all the words spoken so far, or the full dialog audio so far. The entropy benefit in the full-audio condition over the full text condition was substantial, .64 bits per word, greater than the .54 bit benefit of full text context over trigrams. This suggests that language models may be improved by use of the prosody of the speaker and context from the interlocutor.

Full Paper

Bibliographic reference.  Ward, Nigel G. / Walker, Benjamin H. (2009): "Estimating the potential of signal and interlocutor-track information for language modeling", In INTERSPEECH-2009, 160-163.