12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model

Yanhua Long (1), Zhi-Jie Yan (2), Frank K. Soong (2), Li-Rong Dai (1), Wu Guo (1)

(1) USTC, China
(2) Microsoft Research Asia, China

We previously proposed the use of Spectral Subband Energy Ratio (SSER) as speaker features in a speaker verification system [1]. Those SSER features were derived from two distinct componentsthe harmonic and noise speech parts, which were decomposed by the Harmonic plus Noise Model(HNM) from the original speech. In this paper, we report several recent improvements to this approach. First, we go into the details of the two distinct speech components and achieve a surprising better performance by only extracting the separate Spectral Subband Energy features from each component. Second, we propose a soft unvoiced/voiced (U/V) decision method to preserve more speech data during HNM analysis and feature extraction. Greatly improved experiment results have shown the efficiency of this soft U/V decision. Finally, a further preliminary attempt to extract features from linear frequency domain to mel-frequency domain has also been examined.


  1. Long, Y., Yan, Z-J., Soong, F. K., Dai, L. and Guo, W., “Speaker Characterization Using Spectral Subband Energy Ratio Based on Harmonic Plus Noise Model”, in Proc. ICASSP, 2011

Full Paper

Bibliographic reference.  Long, Yanhua / Yan, Zhi-Jie / Soong, Frank K. / Dai, Li-Rong / Guo, Wu (2011): "Improvements in speaker characterization using spectral subband energy based on harmonic plus noise model", In INTERSPEECH-2011, 373-376.