ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2006)
Pittsburgh, PA, USA
This paper addresses automatic speech recognition (ASR) for robots integrated with sound source separation (SSS) by using leak noise based missing feature mask generation. The missing feature theory (MFT) is a promising approach to improve noise-robustness of ASR. An issue in MFT-based ASR is automatic generation of the missing feature mask. To improve robot audition, we applied this theory to interface ASR and SSS which extracts a sound source originated from a specific direction by multiple microphones. In a robot audition system, it is a promising approach to use SSS as a pre-processor for ASR to be able to deal with any kind of noises. However, ASR usually assumes clean speech input, while speech extracted by SSS never fails to be distorted. MFT can be applied to cope with distortion in the extracted speech. In this case, we can assume that the noises included in extracted sounds are mainly leakages from other channels. Thus, we introduced leak noise based missing feature mask generation, which can generate a missing feature mask automatically by using information on leak noise obtained from other channels. To assess the effectiveness of the leak noise based missing feature mask generation, we used two methods for SSS: geometric source separation (GSS) and independent component analysis (ICA), and Multiband Julian for MFT based ASR. The two constructed systems, that is, GSS-based and ICA-based robot audition systems, were evaluated through recognition of simultaneous speech uttered by two speakers. As a result, we showed that the proposed leak noise based missing feature mask generation worked well in both systems.
Bibliographic reference. Yamamoto, Shun'ichi / Takeda, Ryu / Nakadai, Kazuhiro / Nakano, Mikio / Tsujino, Hiroshi / Valin, Jean-Marc / Komatani, Kazunori / Ogata, Tetsuya / Okuno, Hiroshi G. (2006): "Leak energy based missing feature mask generation for ICA and GSS and its evaluation with simultaneous speech recognition", In SAPA-2006, 42-47.