ISCA - International Speech Communication Association
ISCA Archive
Back to list of job offers
Master 2 Internship Proposal
Advisors: Jules Cauzinille, Benoˆıt Favre, Arnaud Rey
November 2023
Deep transfer knowledge from speech to primate vocalizations
Keywords: Computational bioacoustics, deep learning, self-supervised learning, transfer knowledge, efficient fine-tuning, primate vocalizations
1 Context This internship takes part in a multidisciplinary research project aimed at bridging the gap between state of the art deep leaning methods developed for speech processing and computational bioacoustics. Computational bioacoustics is a relatively new research filed which proposes to tackle the study of animal acoustic communication with computational approaches Stowell [2022]. Recently, bioacousticians are showing increasing interest for the deep learning revolution embodied in transformer architectures and self-supervised pre-trained models, but much investigation still needs to be carried out. We propose to test the viability of self-supervision and knowledge transfer as a bioacoustic tool by pre-training models on speech and using them for primate vocalisation analysis.
2 Problem Statement Speech based models are able to reach convincing performance on primate-related tasks including segmentation, individual identification or call type classification Sarkar and Doss [2023] as they are with many different downstream tasks (such as vocal emotion recognition Wang et al. [2021]). We have tested publicly available models such as HuBERT Hsu et al. [2021] and Wav2Vec2 [Schneider et al., 2019], two self-supervised speech-based architectures, on some of these tasks with Gibbon vocalizations. Our method involves probing and traditional fine-tuning of these models.
As to ensure true knowledge transfer from pre-training speech datasets to the downstream classification tasks, the goal of this internship will be to implement efficient fine-tuning methods in a similar fashion. These will allow to limit and control the amount of information lost in the finetuning process. Depending on the interests of the candidate, the methods can include prompt tuning Lester et al. [2021], attention prompting Gao et al. [2023], low rank adaptation Hu et al. [2021] or adversarial reprogramming Elsayed et al. [2018]. The candidate will also be free to explore other methods relevant to the question at hand, either on Gibbons or other species data-sets currently being collected.
3 Profile The intern will propose and implement the efficient fine-tuning solutions on an array of (preferably self-supervised) acoustic models pre-trained on speech or general sound such as HuBERT, Wav2vec, WavLM, VGGish, etc. Exploring adversial re-programming of models pre-trained on other modalities (images, videos, etc.) could also be carried out. The work will be implemented using pytorch.The candidate must have the following qualities :
• Excellent knowledge of deep learning methods
• Extensive experience with PyTorch models
• An interest in processing bioacoustic data
• An interest in reading and writing scientific papers as well as some curiosity for
research challenges
The internship will last 6 months at the LIS and LPC laboratories in Marseille during spring 2024.
The candidate will work in close collaboration with Jules Cauzinille as part of his thesis on “Self-supervised learning for primate vocalization analysis”. The candidate will also be in contact with the researchers community of the ILCB.
4 Contact Please send a CV, transcripts and a letter of application to jules.cauzinille@lis- lab.fr, benoit.favre@lislab.fr, and arnaud.rey@cnrs.fr. Do not hesitate to contact us if you have any question (or if you want to hear what our primates sound like).
References
Gamaleldin F. Elsayed, Ian Goodfellow, and Jascha Sohl-Dickstein. Adversarial reprogramming of neural networks, 2018.
Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, and Yu Qiao. Llama-adapter v2: Parameter-efficient visual instruction model, 2023.
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, PP:1–1, 2021. doi: 10.1109/TASLP.2021.3122291.
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021.
Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning, 2021.
Eklavya Sarkar and Mathew Magimai Doss. Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?, May 2023. arXiv:2305.14035 [cs, eess].
Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. wav2vec: Unsupervised Pre-Training for Speech Recognition. In Proc. Interspeech 2019, pages 3465–3469, 2019. doi: 10.21437/Interspeech.2019-1873.
Dan Stowell. Computational bioacoustics with deep learning: a review and roadmap. 10:e13152, 2022. ISSN 2167-8359. doi: 10.7717/peerj.13152. URL https://peerj.com/articles/13152.
Yingzhi Wang, Abdelmoumene Boumadane, and Abdelwahab Heba. A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding. CoRR, abs/2111.02735, 2021. doi: 10.48550/arXiv.2111.02735
PhD Title: SUMMA-Sound : SUMMarization of Activities of daily living using Sound-based activity recognition Partnership:
IMT Atlantique : Campus ☒ Brest ☐ Nantes ☐ Rennes Laboratory : Lab-STICC Doctoral school : ☒ SPIN ☐ 3MG Funding: IMT Atlantique, co-tutelle with Instituto Superior Técnico
Context : IMT Atlantique, internationally recognised for the quality of its research, is a leading general engineering school under the aegis of the French Ministry of Industry and Digital Technology, ranked in the three main international rankings (THE, SHANGHAI, QS). Located on three campuses, Brest, Nantes and Rennes, IMT Atlantique aims to combine digital technology and energy to transform society and industry through training, research and innovation. It aims to be the leading French higher education and research institution in this field on an international scale. With 290 researchers and permanent lecturers, 1000 publications and 18 M€ of contracts, it supervises 2300 students each year and its training courses are based on cutting-edge research carried out within 6 joint research units: GEPEA, IRISA, LATIM, LABSTICC, LS2N and SUBATECH. The proposed thesis is part of the research activities of the team RAMBO (Robot interaction, Ambient systems, Machine learning, Behaviour, Optimization) and of the laboratory Lab-STICC and the department of Computer Science of IMT Atlantique. Scientific context: The objective of this thesis is to develop a method for collecting and summarizing domestic health-related data relevant for medical diagnosis, in a non-intrusive manner using audio information. This research addresses the lack of existing practical tools for providing high-level succinct information to medical staff on the evolution of patients they follow for health diagnostic purposes. This research is based on the assumption that valuable diagnostic data can be collected by observing short- and long-term lifestyle changes and behavioural anomalies. It relies on the latest advances in the domains of audio-based activity recognition, summarization of human activity, and health diagnosis. Research on health diagnosis in domestic environments has already explored a variety of sensors and modalities for gathering data on human health indicators [5]. Nevertheless, audio-based activity recognition is notable for its less intrusive nature. Employing state-of-the-art sound-based activity recognition models [2] to monitor domestic human activity, the thesis will investigate and develop methods for summarization of human activity [3] in a human-understandable language, in order to produce easily interpretable data by doctors who, remotely, monitor their patients [4]. This work continues and fosters the research of the RAMBO team at IMT Atlantique on ambient systems, enabling well ageing at home for the elderly adults or dependent populations [1]. We expect this thesis to provide technology likely to relieve the burden on gerontologists and elderly-care facilities, and alleviate the caregiver shortage by offering some automatic support to the task of monitoring elderly or handicapped people, enabling them to age-at-home while still being followed by medical specialists using automated means. Expected contributions of the thesis Scientific goals: (1) Determine the set of human activities relevant for health diagnosis, (2) Implement a state-of-the-art model for audio-based activity recognition and validate its function by clinicians, (3) Develop a model for summarizing the evolution of human activity over time intervals of arbitrary duration (typically spanning from days to months and possibly years). Expected outcomes of the PhD: (1) A model for semantic summarization of human activity, based on sound recognition of activities of daily living. (2) A proof of concept for this model Candidate profile and required skills: • Master Degree in Computer Science (or equivalent) • Programming and Software Engineering skills (Python, Git, Software Architecture Design) • Data science skills • Machine learning skills • English speaking and writing skills References: [1] Damien Bouchabou. “Human activity recognition in smart homes : tackling data variability using context-dependent deep learning, transfer learning and data synthesis”. Theses. Ecole nationale supérieure Mines-Télécom Atlantique, May 2022. url: https://theses.hal.science/tel-03728064. [2] Detection and Classification of Acoustic Scenes and Events (DCASE). url: https://dcase.community/challenge2022/task-soundevent-detection-in-domestic-environments (visited on 07/01/2022). [3] P Durga et al. “When less is better: A summarization technique that enhances clinical effectiveness of data”. In: Proceedings of the 2018 International Conference on Digital Health. 2018, pp. 116–120. [4] Akshay Jain et al. “Linguistic summarization of in-home sensor data”. In: Journal of Biomedical Informatics 96 (2019), p. 103240. issn: 1532-0464. [5] Mostafa Haghi Kashani et al. “A systematic review of IoT in healthcare: Applications, techniques, and trends”. In: Journal of Network and Computer Applications 192 (2021), p. 103164. Work Plan: The thesis will be organised in the following steps: (1) Definition of pertinent sounds and activities for health diagnosis, (2) Hardware set-up, (3) Dataset constitution, (4) Activity recognition, (5) Diarization of activities, (6) Summarization, (7) Validation in a real environment. Application: To apply for this position, please send an email with your Curriculum Vitae, a document with your academic results (if possible), and a couple of lines describing your motivation to pursue a PhD to mihai[dot]andries[at]imt-atlantique[dot]fr before 16 May 2023. Additional Information : Application deadline : 16 May 2023 Start date : Fall 2023 Contract duration: 36 months Localisation - Location : Brest (France) and Lisbon (Portugal) Contact(s) : Mihai ANDRIES (mihai[dot]andries[at]imt-atlantique.fr) Plinio Moreno (plinio[at]isr.tecnico.ulisboa.pt)
We’re happy to announce a new research position in the field of speech- and text anonymization at German Research Center for Artificial Intelligence, Berlin, Germany. We’re looking for a full time Researcher or Junior Researcher level, and offer a 2 years contract with optional prolongation and PhD perspective.
© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.