This paper reports on our recent research on surface electromyographic (EMG) speech synthesis: a direct conversion of the EMG signals of the articulatory muscle movements to the acoustic speech signal. In this work we introduce a unit selection approach which compares segments of the input EMG signal to a database of simultaneously recorded EMG/audio unit pairs and selects the best matching audio unit based on target and concatenation cost, which will be concatenated to synthesize an acoustic speech output. We show that this approach is feasible to generate a proper speech output from the input EMG signal. We evaluate different properties of the units and investigate what amount of data is necessary for an initial transformation. Prior work on EMG-to-speech conversion used a frame-based approach from the voice conversion domain, which struggles with the generation of a natural F0 contour. This problem may also be tackled by our unit selection approach.
Bibliographic reference. Zahner, Marlene / Janke, Matthias / Wand, Michael / Schultz, Tanja (2014): "Conversion from facial myoelectric signals to speech: a unit selection approach", In INTERSPEECH-2014, 1184-1188.