ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2006)
Pittsburgh, PA, USA
Automatic speech recognition (ASR) is essential for a robot to communicate with people. One of the main problems with ASR for robots is that robots inevitably generate motor noises. The noise is captured with strong power by the robot's microphones, because the noise sources are closer to the microphones than the target speech source. The signal-to-noise ratio of input speech becomes quite low (less than 0 dB). However, it is possible to estimate the noise by using information on the robot's own motions and postures, because a type of motion/gesture produces almost the same pattern of noise every time it is performed. This paper proposes a method to improve ASR under motor noises by using the information on the robot's motion/gesture. The method selectively uses three techniques . multi-condition training, maximum-likelihood linear regression (MLLR), and missing feature theory (MFT). The former two techniques cope with the motor noises by selecting the noise-type-dependent acoustic model corresponding to a performing motion/gesture. The last technique extracts unreliable acoustic features in an input sound by matching the input with a pre-recorded noise of the current motion/gesture, and masks them in speech recognition to improve ASR performance. Because, in our method, ASR technique selection affects the systems performance, we evaluated the performance of three ASRs for each noise type of a robot's motion/gesture to obtain the best technique selection rule. The preliminary results of isolated word recognition showed the effectiveness of our method using the obtained technique selection rule.
Bibliographic reference. Nishimura, Yoshitaka / Nakano, Mikio / Nakadai, Kazuhiro / Tsujino, Hiroshi / Ishizuka, Mitsuru (2006): "Speech recognition for a robot under its motor noises by selective application of missing feature theory and MLLR", In SAPA-2006, 53-58.