First International Conference on Spoken Language Processing (ICSLP 90)
Results are presented from a multi-speaker, multi-lingual method of phonetic label alignment which is based on a combined application of a Self-Organising Neural Network and a Viterbi decoding and level-building technique constrained by an independently specified string of phonetic segments. The Neural Network is trained to convert vectors of cepstral coefficients into vectors of continuously valued acoustic-phonetic features, and to derive a multi-dimensional Gaussian probability density function for each phonemic unit. Multi-lingual application simply requires the definition of the features for each new language. The Viterbi decoding and Level-Building technique is applied to the task of performing label alignment on large speech corpora. The paper firstly presents results for Danish and English, with distributions for selected features and phonemes in the two languages to show the validity of the approach. Covariance analysis within a language allows a reduction of the features to a maximally discriminative set, and comparison across the languages points to the multi- lingual validity of the feature definitions. Secondly, results are given in a number of histograms showing the accuracy of the alignment settings for selected phoneme classes compared to corresponding settings from manually labelled test databases. The work has been developed in part under the ESPRIT project 'Speech Assessment Methodology' (SAM).
Bibliographic reference. Dalsgaard, Paul / Barry, William (1990): "Acoustic-phonetic features in the framework of neural-network multi-lingual label alignment", In ICSLP-1990, 945-948.