We present a novel statistical model for dynamics of various singing behaviors, such as vibrato and overshoot, in a fundamental frequency (F0) contour. These dynamics are the important cues for perceiving individuality of a singer, and can be a useful measure for various applications, such as singing skill evaluation and singing voice synthesis. While most previous studies have modeled the dynamics using a second-order linear system, the automatic and accurate estimation of model parameters has yet to be accomplished. In this paper, we first develop a complete stochastic representation of the second-order system with Gaussian processes from parametric discretization, and propose a complete, efficient scheme for parameter estimation using the Expectation-Maximization (EM) algorithm. Experimental results show that the proposed method can decompose an F0 contour into a musical component and a dynamics component. Finally, we discuss estimating singing styles from the model parameters for each singer.
Bibliographic reference. Ohishi, Yasunori / Kameoka, Hirokazu / Mochihashi, Daichi / Nagano, Hidehisa / Kashino, Kunio (2010): "Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases", In INTERSPEECH-2010, 2598-2601.