EUROSPEECH 2003 - INTERSPEECH 2003
This article reports a two-part study of structured acoustic modeling of speech. First, speaker-independent clustering of speech material was used as the basis for a practical cluster-based acoustic modeling. Each cluster's training material is applied to the adaptation of baseline hidden Markov model(HMM)parameters for recognition purposes. Further, the training material of each cluster is also used to train phone-level Gaussian mixture models (GMMs) for cluster identification. Test utterances are evaluated on all such models to identify an appropriate cluster or cluster combination. Experiments demonstrate that such cluster-based adaptation can yield accuracy gains over computationally similar baseline models. At the same time, these gains and those of similar methods found in the literature are modest. Hence, the second part of our study examined the limitations of the approach by considering utterance consistency: that is, the ability of acoustically-derived cluster models to uniquely identify a single utterance. These second experiments show that arbitrary pieces of a given utterance are likely to be identified by different clusters, in opposition to an implicit assumption of cluster-based acoustic modeling.
Bibliographic reference. Peters, S. Douglas (2003): "On the limits of cluster-based acoustic modeling", In EUROSPEECH-2003, 1857-1860.