Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images

Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan


Recent developments in real-time magnetic resonance imaging (rtMRI) have enabled the study of vocal tract dynamics during production of running speech at high frame rates (e.g., 83 frames per second). Such large amounts of acquired data require scalable automated methods to identify different articulators (e.g., tongue, velum) for further analysis. In this paper, we propose a convolutional neural network with an encoder-decoder architecture to jointly detect the relevant air-tissue boundaries as well as to label them, which we refer to as ‘semantic edge detection’. We pose this as a pixel labeling problem, with the outline contour of each articulator of interest as positive class and the remaining tissue and airway as negative classes. We introduce a loss function modified with additional penalty for misclassification at air-tissue boundaries to account for class imbalance and improve edge localization. We then use a greedy search algorithm to draw contours from the probability maps of the positive classes predicted by the network. The articulator contours obtained by our method are comparable to the true labels generated by iteratively fitting a manually created subject-specific template. Our results generalize well across subjects and different vocal tract postures, demonstrating a significant improvement over the structured regression baseline.


 DOI: 10.21437/Interspeech.2017-1580

Cite as: Somandepalli, K., Toutios, A., Narayanan, S.S. (2017) Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images. Proc. Interspeech 2017, 631-635, DOI: 10.21437/Interspeech.2017-1580.


@inproceedings{Somandepalli2017,
  author={Krishna Somandepalli and Asterios Toutios and Shrikanth S. Narayanan},
  title={Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={631--635},
  doi={10.21437/Interspeech.2017-1580},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1580}
}