Speech Recognition and Intrinsic Variation (SRIV2006)

Toulouse, France
May 20, 2006

Whither Linguistic Interpretation of Acoustic Pronunciation Variation

Annika Hämäläinen, Yan Han, Lou Boves, Louis ten Bosch

Centre for Language and Speech Technology (CLST), Radboud University Nijmegen, The Netherlands

Recent research suggests that modelling pronunciation variation is more appropriate at the syllable level than at the level of context-dependent phones. Due to the large number of factors affecting syllable pronunciation, the creation of multi-path topologies is necessary. Previous research on multi-path models in connected digit recognition has proved trajectory clustering to be an attractive approach to deriving multi-path models. In this paper, we extend our research to large-vocabulary continuous speech recognition (LVCSR) by deriving trajectory clusters for 94 frequent syllables in a 20-hour corpus of Dutch read speech. With multi-path models based on these trajectory clusters, speech recognition performance improves significantly. We believe that recognition performance can be improved further by adapting the topologies of the parallel paths. However, the physical properties of the clusters do not provide clues to the most appropriate topology, or the best way of initialising the state observation densities. Therefore, we attempt to interpret the clusters in terms of linguistic and phonetic criteria. The results obtained so far suggest that there is no straightforward relation between physically defined trajectory clusters and linguistic and phonetic criteria.

Full Paper
Presentation (.ppt)

Bibliographic reference.  Hämäläinen, Annika / Han, Yan / Boves, Lou / Bosch, Louis ten (2006): "Whither linguistic interpretation of acoustic pronunciation variation", In SRIV-2006, 21-26.