Phoneme-Discriminative Features for Dysarthric Speech Conversion

Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki


We present in this paper a Voice Conversion (VC) method for a person with dysarthria resulting from athetoid cerebral palsy. VC is being widely researched in the field of speech processing because of increased interest in using such processing in applications such as personalized Text-To-Speech systems. A Gaussian Mixture Model (GMM)-based VC method has been widely researched and Partial Least Square (PLS)-based VC has been proposed to prevent the over-fitting problems associated with the GMM-based VC method. In this paper, we present phoneme-discriminative features, which are associated with PLS-based VC. Conventional VC methods do not consider the phonetic structure of spectral features although phonetic structures are important for speech analysis. Especially for dysarthric speech, their phonetic structures are difficult to discriminate and discriminative learning will improve the conversion accuracy. This paper employs discriminative manifold learning. Spectral features are projected into a subspace in which a near point with the same phoneme label is close to another and a near point with a different phoneme label is apart. Our proposed method was evaluated on dysarthric speaker conversion task which converts dysarthric voice into non-dysarthric speech.


 DOI: 10.21437/Interspeech.2017-664

Cite as: Aihara, R., Takiguchi, T., Ariki, Y. (2017) Phoneme-Discriminative Features for Dysarthric Speech Conversion. Proc. Interspeech 2017, 3374-3378, DOI: 10.21437/Interspeech.2017-664.


@inproceedings{Aihara2017,
  author={Ryo Aihara and Tetsuya Takiguchi and Yasuo Ariki},
  title={Phoneme-Discriminative Features for Dysarthric Speech Conversion},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3374--3378},
  doi={10.21437/Interspeech.2017-664},
  url={http://dx.doi.org/10.21437/Interspeech.2017-664}
}