PhonVoc: A Phonetic and Phonological Vocoding Toolkit

Milos Cernak, Philip N. Garner


We present the PhonVoc toolkit, a cascaded deep neural network (DNN) composed of speech analyser and synthesizer that use a shared phonetic and/or phonological speech representation. The free toolkit is distributed as open-source software under a BSD 3-Clause License, available at https://github.com/idiap/phonvoc with the pre-trained US English analysis and synthesis DNNs, and thus it is ready for immediate use.

In a broader context, the toolkit implements training and testing of the analysis by synthesis heuristic model. It is thus designed for the wider speech community working in acoustic phonetics, laboratory phonology, and parametric speech coding. The toolkit interprets the phonetic posterior probabilities as a sequential scheme, whereas the phonological posterior-class probabilities are considered as a parallel via K different phonological classes. A case study is presented on a LibriSpeech database and a LibriVox US English native female speaker. The phonetic and phonological vocoding yield comparable performance, improving speech quality by merging the phonetic and phonological speech representation.


DOI: 10.21437/Interspeech.2016-235

Cite as

Cernak, M., Garner, P.N. (2016) PhonVoc: A Phonetic and Phonological Vocoding Toolkit. Proc. Interspeech 2016, 988-992.

Bibtex
@inproceedings{Cernak+2016,
author={Milos Cernak and Philip N. Garner},
title={PhonVoc: A Phonetic and Phonological Vocoding Toolkit},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-235},
url={http://dx.doi.org/10.21437/Interspeech.2016-235},
pages={988--992}
}