Unsupervised discovery of subword units is an important problem in recognition and synthesis of zero-resource languages, in which phonesets may not be known and the only resource that may be available is speech. We use techniques that we have recently developed for building synthetic voices for very low resource languages without a written form to discover such units. We use Articulatory Features trained on labeled speech in a higher resource language to infer phonological segments of varying granularity. We use both the raw Articulatory Features and the Articulatory Features of the inferred units as frame-based representations of speech. We evaluate our techniques on minimal pair ABX discrimination within and across speakers. In addition, to exploit the duration information we get from the inferred phonological units, we also present evaluation results on Mel Cepstral Distortion, an objective metric of speech synthesis quality. We evaluate our techniques on multiple databases of English, and also on Tsonga and Indic languages, in which we apply the above methods cross-lingually.
Bibliographic reference. Baljekar, Pallavi / Sitaram, Sunayana / Muthukumar, Prasanna Kumar / Black, Alan W. (2015): "Using articulatory features and inferred phonological segments in zero resource speech processing", In INTERSPEECH-2015, 3194-3198.