In this paper we investigate the behaviour of different acoustic distance measures for template based speech recognition in light of the combination of acoustic distances, linguistic knowledge and template concatenation fluency costs. To that end, different acoustic distance measures are compared on tasks with varying levels of fluency/linguistic constraints. We show that the adoption of those constraints invariably results in an acoustically clearly suboptimal template sequence being chosen as the winning hypothesis. There are strong implications for the design of acoustic distance measures: distance measures that are optimal for frame based classification may prove to be suboptimal for full sentence recognition. In particular, we show this is the case when comparing the Euclidean and the recently introduced adaptive kernel local Mahalanobis distance measures.
Bibliographic reference. Wachter, Mathias De / Demuynck, Kris / Wambacq, Patrick / Compernolle, Dirk Van (2007): "Evaluating acoustic distance measures for template based recognition", In INTERSPEECH-2007, 874-877.