![]() |
DiSS-LPSS Joint Workshop 2010The 5th Workshop on Disfluency in Spontaneous Speech
|
![]() |
Vocal effort mismatch in training and test data leads to immense
degradations of speaker recognition systems. The changes on
the acoustics of a speech signal induced by raised vocal effort
are complex and despite several studies from various authors
not completely known yet.
Instead of just gaining knowledge about these differences
for automatic speaker recognition it is rather an essential to discover
features that remain relatively stable in changing vocal
effort conditions and contain speaker specific information. In
this study we investigate the center of gravity (COG) ratio for
high and mid frequency bands as feature for speaker recognition.
We find that vocal effort mismatch leads to an equal error
rate (EER) more than six times higher for a standard MFCCbased
GMM-UBM system. For the COG ratio we observe a
much smaller degradation of around 25%.
When adapting the UBM with additional high-effort speech
data the EER of the COG ratio gets even better for the mismatch
condition than for the matching task. Combining MFCC and the
COG ratio leads to best results with an overall improvement of
16% compared to the standard MFCC-based system.
Index Terms. vocal effort, speaker recognition, center of gravity ratio
Bibliographic reference. Harwardt, Corinna (2010): "Investigating the COG ratio as feature for speaker verification on high-effort speech", In DiSS-LPSS-2010, 35-38.