10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Speaker Dependent Mapping for Low Bit Rate Coding of Throat Microphone Speech

Joseph M. Anand (1), B. Yegnanarayana (1), Sanjeev Gupta (2), M. R. Kesheorey (2)

(1) IIIT Hyderabad, India
(2) Center for Artificial Intelligence & Robotics, India

Throat microphones (TM) which are robust to background noise can be used in environments with high levels of background noise. Speech collected using TM is perceptually less natural. The objective of this paper is to map the spectral features (represented in the form of cepstral features) of TM and close speaking microphone (CSM) speech to improve the formerís perceptual quality, and to represent it in an efficient manner for coding. The spectral mapping of TM and CSM speech is done using a multilayer feed-forward neural network, which is trained from features derived from TM and CSM speech. The sequence of estimated CSM spectral features is quantized and coded as a sequence of codebook indices using vector quantization. The sequence of codebook indices, the pitch contour and the energy contour derived from the TM signal are used to store/transmit the TM speech information efficiently. At the receiver, the all-pole system corresponding to the estimated CSM spectral vectors is excited by a synthetic residual to generate the speech signal.

Full Paper

Bibliographic reference.  Anand, Joseph M. / Yegnanarayana, B. / Gupta, Sanjeev / Kesheorey, M. R. (2009): "Speaker dependent mapping for low bit rate coding of throat microphone speech", In INTERSPEECH-2009, 1087-1090.