Robust stochastic modeling of speech is an important issue for the performance of practical applications. The Gaussian mixture model, GMM, is widely used in speaker ID, but its performance would get limited in the presence of unseen noise and distortions. Such noisy data, referred to as ”out-liers” for the original distribution, can be better represented by the use of heavy-tail distributions, such as Student’s t-distribution. It provides a natural choice in which the heavy-tail can be controlled using the degrees-of-freedom parameter. We explore finite mixture of t-distributions model (tMM), to represent noisy speech data and show its robustness for speaker ID, compared to GMM. Using the TIMIT and NTIMIT databases, the recognition accuracy obtained are 100% and 79.68% with a 34 mixture tMM respectively much better than those reported in the literature.
Bibliographic reference. Harshavardhan, Sundar / Sreenivas, Thippur V. (2010): "Robust mixture modeling using t-distribution: application to speaker ID", In INTERSPEECH-2010, 2750-2753.