ISCA Archive SSW 2019
ISCA Archive SSW 2019

Loss Function Considering Temporal Sequence for Feed-Forward Neural Network–Fundamental Frequency Case

Noriyuki Matsunaga, Yamato Ohtani, Tatsuya Hirahara

This paper describes a novel loss function for training feedforward neural networks (FFNNs), which can generate smooth speech parameter sequences without post-processing. In statistical parametric speech synthesis based on deep neural networks (DNNs), maximum likelihood parameter generation (MLPG) or recurrent neural networks (RNNs) are generally used to generate smooth speech parameter sequences. However, because the MLPG process requires utterance-level processing, it is not suitable for speech synthesis requiring low latency. Furthermore, networks such as long short-term memory RNNs (LSTM-RNNs) have high computational costs. As RNNs are not recommended in limited computational resource situations, we look at employing FFNNs as an alternative. One limitation of FFNNs is that they train to ignore relationships between speech parameters in adjacent frames. To overcome this limitation and generate smooth speech parameter sequences from FFNNs alone, we propose a novel loss function that uses long- and short-term features from speech parameters. We evaluated the proposed loss function with a focus on the fundamental frequency (F0) at found that, using the proposed loss function, an FFNN-only approach can generate F0 contours that are perceptually equal to or better in terms of naturalness than those generated by MLPG or LSTM-RNNs.


doi: 10.21437/SSW.2019-26

Cite as: Matsunaga, N., Ohtani, Y., Hirahara, T. (2019) Loss Function Considering Temporal Sequence for Feed-Forward Neural Network–Fundamental Frequency Case. Proc. 10th ISCA Workshop on Speech Synthesis (SSW 10), 143-148, doi: 10.21437/SSW.2019-26

@inproceedings{matsunaga19_ssw,
  author={Noriyuki Matsunaga and Yamato Ohtani and Tatsuya Hirahara},
  title={{Loss Function Considering Temporal Sequence for Feed-Forward Neural Network–Fundamental Frequency Case}},
  year=2019,
  booktitle={Proc. 10th ISCA Workshop on Speech Synthesis (SSW 10)},
  pages={143--148},
  doi={10.21437/SSW.2019-26}
}