11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Statistical Modeling of F0 Dynamics in Singing Voices Based on Gaussian Processes with Multiple Oscillation Bases

Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Hidehisa Nagano, Kunio Kashino

NTT Corporation, Japan

We present a novel statistical model for dynamics of various singing behaviors, such as vibrato and overshoot, in a fundamental frequency (F0) contour. These dynamics are the important cues for perceiving individuality of a singer, and can be a useful measure for various applications, such as singing skill evaluation and singing voice synthesis. While most previous studies have modeled the dynamics using a second-order linear system, the automatic and accurate estimation of model parameters has yet to be accomplished. In this paper, we first develop a complete stochastic representation of the second-order system with Gaussian processes from parametric discretization, and propose a complete, efficient scheme for parameter estimation using the Expectation-Maximization (EM) algorithm. Experimental results show that the proposed method can decompose an F0 contour into a musical component and a dynamics component. Finally, we discuss estimating singing styles from the model parameters for each singer.

Full Paper

Bibliographic reference.  Ohishi, Yasunori / Kameoka, Hirokazu / Mochihashi, Daichi / Nagano, Hidehisa / Kashino, Kunio (2010): "Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases", In INTERSPEECH-2010, 2598-2601.