A Recurrent Neural Network Approach to Audio Segmentation for Broadcast Domain Data

Pablo Gimeno, Ignacio Viñals, Alfonso Ortega, Antonio Miguel, Eduardo Lleida


This paper presents a new approach for automatic audio segmentation based on Recurrent Neural Networks. Our system takes advantage of the capability of Bidirectional Long Short Term Memory Networks (BLSTM) for modeling temporal dynamics of the input signals. The DNN is complemented by a resegmentation module, gaining long-term stability by means of the tied-state concept in Hidden Markov Models. Furthermore, feature exploration has been performed to best represent the information in the input data. The acoustic features that have been included are spectral log-filter-bank energies and musical features such as chroma. This new approach has been evaluated with the Albayzín 2010 audio segmentation evaluation dataset. The evaluation requires to differentiate five audio conditions: music, speech, speech with music, speech with noise and others. Competitive results were obtained, achieving a relative improvement of 15.75% compared to the best results found in the literature for this database.


 DOI: 10.21437/IberSPEECH.2018-19

Cite as: Gimeno, P., Viñals, I., Ortega, A., Miguel, A., Lleida, E. (2018) A Recurrent Neural Network Approach to Audio Segmentation for Broadcast Domain Data. Proc. IberSPEECH 2018, 87-91, DOI: 10.21437/IberSPEECH.2018-19.


@inproceedings{Gimeno2018,
  author={Pablo Gimeno and Ignacio Viñals and Alfonso Ortega and Antonio Miguel and Eduardo Lleida},
  title={{A Recurrent Neural Network Approach to Audio Segmentation for Broadcast Domain Data}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={87--91},
  doi={10.21437/IberSPEECH.2018-19},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-19}
}