A simple and high-speed F0 extractor with high temporal resolution is proposed based on a waveform symmetry measure. Strictly speaking, it is not an F0 extractor. Instead, it is a detector of the lowest prominent sinusoidal component with a salience measure. It can make use of an F0 refinement procedure, when the signal under investigation is a sum of harmonic sinusoidal components. The refinement procedure is based on a stable representation of instantaneous frequency of periodic signals. Application of the proposed algorithm revealed that rapid temporal modulations in both F0 trajectory and spectral envelope exist typically in expressive voices such as lively singing performance. Manipulation of these temporal fine structures (texture) effectively modified perceptual expressiveness, while somewhat preserving perceptual vocal effort and register.
Index Terms: speech analysis, speech synthesis, expressive speech, singing voices
Full Paper Demo Video (MP4; 14 MB)
Bibliographic reference. Kawahara, Hideki / Morise, Masanori / Nisimura, Ryuichi / Irino, Toshio (2012): "Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation", In INTERSPEECH-2012, 386-389.