![]() |
INTERSPEECH 2015
|
![]() |
In this paper we introduce a new Italian dataset consisting of simultaneous
recordings of continuous speech and trajectories of important vocal
tract articulators (i.e. tongue, lips, incisors) tracked by Electromagnetic
Articulography (EMA). It includes more than 500 sentences uttered in
citation condition by three speakers, one male (cnz) and two females
(lls, olm), for approximately 2 hours of speech material.
Such dataset has been
designed to be large enough and phonetically balanced so as to be used
in speech applications (e.g. speech recognition systems).
We then test our speaker-dependent
articulatory Deep-Neural-Network Hidden-Markov-Model (DNN-HMM) phone
recognizer on the set of data recorded from the cnz speaker.
We show that phone
recognition results are comparable to the ones that we previously obtained
using two well-known British-English datasets with EMA data of equivalent
vocal tract articulators. That suggests that the new set of data is
a equally useful and coherent resource.
The dataset is the
session 1 of a larger Italian corpus, called Multi-SPeaKing-style-Articulatory
(MSPKA) corpus, including parallel audio and articulatory data in diverse
speaking styles (e.g. read, hyperarticulated and hypoarticulated speech).
It is freely available at http://www.mspkacorpus.it for research purposes.
In the immediate future the whole corpus will be released.
Bibliographic reference. Canevari, Claudia / Badino, Leonardo / Fadiga, Luciano (2015): "A new Italian dataset of parallel acoustic and articulatory data", In INTERSPEECH-2015, 2152-2156.