INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

A Digital Signal Processor Implementation of Silent/Electrolaryngeal Speech Enhancement based on Real-Time Statistical Voice Conversion

Takuto Moriguchi (1), Tomoki Toda (1), Motoaki Sano (2), Hiroshi Sato (2), Graham Neubig (1), Sakriani Sakti (1), Satoshi Nakamura (1)

(1) NAIST, Japan
(2) Foster Electronic Co. Ltd., Japan

In this paper, we present a digital signal processor (DSP) implementation of real-time statistical voice conversion (VC) for silent speech enhancement and electrolaryngeal speech enhancement. As a silent speech interface, we focus on non-audible murmur (NAM), which can be used in situations where audible speech is not acceptable. Electrolaryngeal speech is one of the typical types of alaryngeal speech produced by an alternative speaking method for laryngectomees. However, the sound quality of NAM and electrolaryngeal speech suffers from lack of naturalness. VC has proven to be one of the promising approaches to address this problem, and it has been successfully implemented on devices with sufficient computational resources. An implementation on devices that are highly portable but have limited computational resources would greatly contribute to its practical use. In this paper we further implement real-time VC on a DSP. To implement the two speech enhancement systems based on real-time VC, one from NAM to a whispered voice and the other from electrolaryngeal speech to a natural voice, we propose several methods for reducing computational cost while preserving conversion accuracy. We conduct experimental evaluations and show that real-time VC is capable of running on a DSP with little degradation.

Full Paper

Bibliographic reference.  Moriguchi, Takuto / Toda, Tomoki / Sano, Motoaki / Sato, Hiroshi / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2013): "A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion", In INTERSPEECH-2013, 3072-3076.