Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Unified Architecture for Auditory Scene Analysis and Spoken Language Processing

Tomohiro Nakatani, Takeshi Kawabata, Hiroshi G. Okuno

NTT Basic Research Laboratories, Kanagawa, Japan

We proposes a unified architecture for auditory scene analysis (ASA) and spoken language processing (SLP). The unified system is expected to provide robust and friendly human-computer interface in the real acoustical environment. Because the number of auditory streams can be enormous, an adaptive stream attention mechanism is required. We consider adaptive understanding (behavior) emerges from competing goals in a multi-agent system. The behavior-based multi-agent system can explain various kinds of activities in human communications. In this paper, as the first stage of this approach, we design and implement a multi-agent based stream segregation system. The system dynamically generates stream segregation agents, which extract auditory streams incrementally. As clues to segregation, these agents use only simple sound attributes, harmonics and average spectrum intensity. To resolve stream interference, each agent communicates with other agents by signal subtraction and by common threshold modification. The resulting system, as a whole, segregates streams adaptively from the mixed sounds. The experimental results show that the system can segregate dual voice effectively even under noisy condition.

Full Paper

Bibliographic reference.  Nakatani, Tomohiro / Kawabata, Takeshi / Okuno, Hiroshi G. (1994): "Unified architecture for auditory scene analysis and spoken language processing", In ICSLP-1994, 1403-1406.