A Note Based Query By Humming System Using Convolutional Neural Network

Naziba Mostafa, Pascale Fung


In this paper, we propose a note-based query by humming (QBH) system with Hidden Markov Model (HMM) and Convolutional Neural Network (CNN) since note-based systems are much more efficient than the traditional frame-based systems. A note-based QBH system has two main components: humming transcription and candidate melody retrieval.

For humming transcription, we are the first to use a hybrid model using HMM and CNN. We use CNN for its ability to learn the features directly from raw audio data and for being able to model the locality and variability often present in a note and we use HMM for handling the variability across the time-axis.

For candidate melody retrieval, we use locality sensitive hashing to narrow down the candidates for retrieval and dynamic time warping and earth mover’s distance for the final ranking of the selected candidates.

We show that our HMM-CNN humming transcription system outperforms other state of the art humming transcription systems by ~2% using the transcription evaluation framework by Molina et. al and our overall query by humming system has a Mean Reciprocal Rank of 0.92 using the standard MIREX dataset, which is higher than other state of the art note-based query by humming systems.


 DOI: 10.21437/Interspeech.2017-1590

Cite as: Mostafa, N., Fung, P. (2017) A Note Based Query By Humming System Using Convolutional Neural Network. Proc. Interspeech 2017, 3102-3106, DOI: 10.21437/Interspeech.2017-1590.


@inproceedings{Mostafa2017,
  author={Naziba Mostafa and Pascale Fung},
  title={A Note Based Query By Humming System Using Convolutional Neural Network},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3102--3106},
  doi={10.21437/Interspeech.2017-1590},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1590}
}