INTERSPEECH 2015
16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Non-Audible Murmur Enhancement Based on Statistical Conversion Using Air- and Body-Conductive Microphones in Noisy Environments

Yusuke Tajiri, Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura

NAIST, Japan

Non-Audible Murmur (NAM) is an extremely soft whispered voice detected by a special body-conductive microphone called a NAM microphone. Although NAM is a promising medium for silent speech communication, its quality is significantly degraded by its faint volume and spectral changes caused by body-conductive recording. To improve the quality of NAM, several enhancement methods based on statistical voice conversion (VC) techniques have been proposed, and their effectiveness has been confirmed in quiet environments. However, it can be expected that NAM will be used not only in quiet, but also in noisy environments, and it is thus necessary to develop enhancement methods that will also work in these cases. In this paper, we propose a framework for NAM enhancement using not only the NAM microphone but also an air-conductive microphone. Air- and body-conducted NAM signals are used as the input of VC to estimate a more naturally sounding speech signal. To clarify adverse effects of external noises on the performance of the proposed framework and investigate a possibility to alleviate them by revising VC models, we also implement noise-dependent VC models within the proposed framework. Experimental results demonstrate that the proposed framework yields significant improvements in the spectral conversion accuracy and listenability of enhanced speech under both quiet and noisy environments.

Full Paper

Bibliographic reference.  Tajiri, Yusuke / Tanaka, Kou / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2015): "Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments", In INTERSPEECH-2015, 2769-2773.