Degree of "shout" singing performance is effectively controlled by combining global spectral shape equalization, peak cancellation in frequency modulation spectrum of F0 trajectory, and synchronized shape-modulation of voice spectral envelope. This "shout-reduction" processing is based on a symmetry-based F0 extractor with fine temporal resolution, a temporally stable representation of instantaneous frequency of periodic signals, and the TANDEM-STRAIGHT, a speech analysis, modification and resynthesis framework. The proposed procedure successfully converted an expressive Japanese POP song performance with "shout" into a plain performance without damaging original naturalness. Possibility of adding artificial "shout" to plain performance is also discussed.
Bibliographic reference. Nishigaki, Yuri / Sakakibara, Ken-Ichi / Morise, Masanori / Nisimura, Ryuichi / Irino, Toshio / Kawahara, Hideki (2013): "Controlling “shout” expression in a Japanese POP singing performance: analysis and suppression study", In INTERSPEECH-2013, 2905-2909.