![]() |
INTERSPEECH 2011
|
![]() |
We previously proposed the use of Spectral Subband Energy Ratio (SSER) as speaker features in a speaker verification system [1]. Those SSER features were derived from two distinct componentsthe harmonic and noise speech parts, which were decomposed by the Harmonic plus Noise Model(HNM) from the original speech. In this paper, we report several recent improvements to this approach. First, we go into the details of the two distinct speech components and achieve a surprising better performance by only extracting the separate Spectral Subband Energy features from each component. Second, we propose a soft unvoiced/voiced (U/V) decision method to preserve more speech data during HNM analysis and feature extraction. Greatly improved experiment results have shown the efficiency of this soft U/V decision. Finally, a further preliminary attempt to extract features from linear frequency domain to mel-frequency domain has also been examined.
Bibliographic reference. Long, Yanhua / Yan, Zhi-Jie / Soong, Frank K. / Dai, Li-Rong / Guo, Wu (2011): "Improvements in speaker characterization using spectral subband energy based on harmonic plus noise model", In INTERSPEECH-2011, 373-376.