An electrolarynx is a device that artificially generates excitation sounds to enable laryngectomees to produce electrolaryngeal (EL) speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. To address this issue, we have proposed several EL speech enhancement methods using statistical voice conversion and showed that statistical prediction of excitation parameters, such as F0 patterns, was essential to significantly improve naturalness of EL speech. In these methods, the original EL speech is recorded with a microphone and the enhanced EL speech is presented from a loudspeaker in real time. This framework is effective for telecommunication but it is not suitable to face-to-face conversation because both the original EL speech and the enhanced EL speech are presented to listeners. In this paper, we propose direct F0 control of the electrolarynx based on statistical excitation prediction to develop an EL speech enhancement technique also effective for face-to-face conversation. F0 patterns of excitation signals produced by the electrolarynx are predicted in real time from the EL speech produced by the laryngectomee's articulation of the excitation signals with previously predicted F0 values. A simulation experiment is conducted to evaluate the effectiveness of the proposed method. The experimental results demonstrate that the proposed method yields significant improvements in naturalness of EL speech while keeping its intelligibility high enough.
Bibliographic reference. Tanaka, Kou / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2014): "Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation", In INTERSPEECH-2014, 31-35.