INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Across-Speaker Articulatory Normalization for Speaker-Independent Silent Speech Recognition

Jun Wang (1), Ashok Samal (2), Jordan R. Green (3)

(1) University of Texas at Dallas, USA
(2) University of Nebraska-Lincoln, USA
(3) MGH Institute of Health Professions, USA

Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without using audio information), have the potential to enable persons with laryngectomy or a neurological disease to produce synthesized speech with a natural sounding voice using their tongue and lips. Current approaches to SSIs have largely relied on speaker-dependent recognition models to minimize the negative effects of talker variation on recognition accuracy. Speaker-independent approaches are needed to reduce the large amount of training data required from each user; only limited articulatory samples are often available for persons with moderate to severe speech impairments, due to the logistic difficulty of data collection. This paper reported an across-speaker articulatory normalization approach based on Procrustes matching, a bidimensional regression technique for removing translational, scaling, and rotational effects of spatial data. A dataset of short functional sentences was collected from seven English talkers. A support vector machine was then trained to classify sentences based on normalized tongue and lip movements. Speaker-independent classification accuracy (tested using leave-one-subject-out cross validation) improved significantly, from 68.63% to 95.90%, following normalization. These results support the feasibility of a speaker-independent SSI using Procrustes matching as the basis for articulatory normalization across speakers.

Full Paper

Bibliographic reference.  Wang, Jun / Samal, Ashok / Green, Jordan R. (2014): "Across-speaker articulatory normalization for speaker-independent silent speech recognition", In INTERSPEECH-2014, 1179-1183.