ISCA - International Speech
Communication Association

  • Home
  • Post a New Job Offer
<< First  < Prev   1   2   Next >  Last >> 
  • 2024-06-24 11:35 | Anonymous member

    KU Leuven's Faculty of Engineering Science has an open position for a junior professor (tenure track) in the area ofSpoken Language Technologies. The successful candidate will conduct research on current challenges of speech technology and its applications,teach courses in the Master of Engineering Scienceand supervise students in the Master and PhD programs. The candidate will be embedded in the PSI research divisionof the Department of Electrical Engineering. More information is available at The deadline for applications is September 30, 2024. 

    KU Leuven is committed to creating a diverse environment. It explicitly encourages candidates from groups that are currently underrepresented at the university to submit their applications. 

  • 2024-05-08 12:02 | Anonymous member (Administrator)

    Saarland University is a campus university with an international focus and a strong research profile. With numerous internationally respected research institutes on campus and dedicated support for collaborative projects, Saarland University is an ideal environment for innovation and technology transfer. The German Research Center for Artificial Intelligence (DFKI) is Germany's leading application-driven research institute with a core technology transfer mission. DFKI is currently the world's largest research centre for artificial intelligence operated as a public-private partnership. DFKI maintains close collaborative ties with national and international companies and is firmly rooted in the worldwide scientific AI landscape.

    To further strengthen this excellence in research and teaching, the Department of Language Science and Technology(LST) in collaboration with the German Research Center for Artificial Intelligence (DFKI) is inviting applications for the following position:

    Professorship (W3) in Language Technology

    (m/f/x; Reference: W2464)

    This position is a permanent public sector appointment (equivalent to a 'full-tenured professorship') starting at the earliest possible opportunity. We are looking for an experienced researcher in the field of language technology who has extensive knowledge of natural language processing and machine learning/AI methodologies. Experience with dialogue systems and reinforcement learning, the development of foundation models and/or trustworthy Artificial Intelligence is also desirable. In addition to holding a professorship at the university, the successful candidate will also be appointed as a scientific director at the German Research Center for Artificial Intelligence (DFKI) where they will head a research department. DFKI is an application-driven research organization that is largely financed through external project funding. A demonstrated ability to attract significant external funding for research projects at the national and international level is therefore essential. We also expect candidates to have experience in interdisciplinary research and in collaborating with industrial partners.The Department of Language Science and Technology is internationally recognized for its collaborative and interdisciplinary research, and the successful candidate will be expected to contribute to relevant jointr esearch initiatives. Language technologies are core elements of our study programmes at the M.Sc./M.A.and B.Sc./B.A. level and the person appointed will teach courses within these programmes.

    What we can offer you:

    The successful candidate will conduct world-class research, lead their own research group at the university and perform teaching and supervisory duties at the undergraduate, graduate and doctoral levels. At DFKI, the person appointed will lead a research department with access to an extensive worldwide network of industrial and other research partners, facilitating research and impact at a scale that is otherwise difficultto achieve. The position offers excellent working conditions in a lively and international scientific community. Saarland University is one of the leading centres for language science and computational linguistics in Europe and offers a dynamic and stimulating research environment. The Department of Language Science and Technology (LST) employs about 100 research staff across nine research groups in the fields of computational linguistics, natural language processing, psycholinguistics, phonetics and speech science, speech processing, and corpus linguistics ( The department serves as the focal point of the Collaborative Research Centre 1102 'Information Density and LinguisticEncoding'('Neuroexplicit Models of Language, Vision, and Action' (, both of which involve close collaborationwithDFKI.TheLSTdepartmentandtheDFKIarebothpartoftheSaarlandInformaticsCampus (SIC:, which brings together some 800 researchers and over 2000studentsfrom81countries.SICisacollaborationbetweenSaarlandUniversityandworld-classresearch institutions on campus, which in addition to DFKI include the Max Planck Institute for Informatics and the Max Planck Institute for SoftwareSystems.


    The appointment will be made in accordance with the general provisions of German public sector employmentlaw.Candidatesmusthaveexperienceinandanaptitudeforacademicteaching.Theywillhave a PhD or doctorate in an appropriate subject and will have demonstrated a particular capacity for independent academic research, typically by having obtained an advanced, post-doctoral research degree ( Habilitation) or by having published an equivalent volume of peer-reviewed research or by having been appointed to a junior professorship or similar position. They will have a proven track record of leading their own research group and of acquiring external research funding. The successful candidate will be expected to actively contribute to departmental research and teaching. The language of instruction is English (in the M.Sc. and M.A. programmes) and German (in the B.Sc./B.A. programmes). We expect the successful candidate either to have sufficient proficiency to teach in both languages or to be willing to acquire this  level of proficiency within an appropriateperiod.

    Your Application:

    Applications should be submitted online at No additional paper copy is required. The application must contain:

    • a letter of application and CV/résumé (including your telephone number andemail address)
    • a complete list of your academicpublications
    • a complete list of external funding (stating own share if you were not the solebeneficiary)
    • your proposed research concept (2–5pages)
    • your teaching concept (1page)
    • copies of your degreecertificates
    • complete copies of your five most significantpublications
    • the names of three academic references (including email addresses),at least one of whom is not one of your previous academic supervisors.
    • If you hold a university degree from a foreign university, please provide proof of equivalence from Germany's Central Office for Foreign Education (ZAB) if available. If proof of equivalence has not been requested at the time of application, it must be submitted later upon request.

    Applications must be received no later than May 30, 2024.

    Please include the job reference number W2464 when you apply. Selected candidates will be interviewed. If you have any questions, please contact:

    At Saarland University, we view internationalization as a process spanning all aspects of university life. We therefore expect members of our professorial staff to engage in activities that promote and foster further internationalization. Special support will be provided for projects that maintain collaborative interactions within existing international cooperative networks, e.g. projects with partners in the European University Alliance Transform4Europe ( or the University of the Greater Region (www.uni-

    Saarland University is an equal opportunity employer. In accordance with its affirmative action policy, Saarland University is actively seeking to increase the proportion of women in this field. Qualified women candidates are therefore strongly encouraged to apply. Preferential consideration will be given to applications from disabled candidates of equal eligibility. We welcome applications regardless of nationality, ethnic and social origin, religion/belief, age, sexual orientation and identity.

    WhenyousubmitajobapplicationtoSaarlandUniversityyouwillbetransmittingpersonaldata.Pleaserefer to our privacy notice ( for information on howwe collect and process personal data in accordance with Art. 13 of the General Data Protection Regulation (GDPR). By submitting your application, you confirm that you have taken note of the information in the Saarland University privacynotice.

    The full job advertisement can be found at: |

  • 2024-02-19 16:44 | Anonymous member

    The Laboratory of Language Technology ( at Tallinn University of Technology, Estonia, is looking to fill a postdoc position in the field of speech processing and/or NLP. The position is funded by EXAI -- the Estonian Centre of Excellence in Artificial Intelligence (2024−2030).

    The position is flexible with respect to topic, but it should connect thematically with current topics of interest to the research group (speech recognition, speaker and language recognition, speaker diarization, spoken language translation, summarization, low resource scenarios). Some possible research directions are using and finetuning of different speech and language foundation models (such as wav2vec2.0, Whisper, LLMs) for various speech and language processing tasks.

    The position does not include any teaching load, but supervision of Master and PhD students is expected.

    We are looking for candidates who have finished, or are about to complete, a PhD degree in speech processing, NLP or a related discipline. You must be proficient in English (spoken and written). Applicants should have demonstrated their research expertise through high-quality publications.

    The starting salary for this position is around 3500 euros per month (before taxes, around 2700 euros after taxes) and increases with experience. Additional benefits include roughly 6 weeks of paid annual leave, paid sick leave as well as maternity and parental leave. The initial appointment will be for two years; the position could be extended and migrated to a permanent researcher position, if suitable for both parties. The starting date is March 2024 or later; we would be willing to adapt to the time requirements of an ideal candidate.

    How to apply:

    Please send an e-mail to Tanel Alumäe ( with the following information:

    * a short statement (just a few sentences) of research interests that motivates why you are applying for this position;
    * a full CV including your list of publications;

    Or, just apply via Linkedin:

    Unofficial inquiries about the position are also welcome!

  • 2024-01-04 15:22 | Anonymous

    Nous proposons un stage de recherche (Bac+5) au service recherche de l'Institut National de l'Audiovisuel (INA). Le stage porte sur la détection de l'activité vocale dans des corpus audiovisuels à l'aide de représentations auto-supervisées.

    Vous trouverez ci-joint l'offre de stage détaillée.

    D'autres stages sont également proposés au sein de l'INA, l'ensemble des sujets peuvent être retrouvés sur la page suivante :


    Détection de l'activité vocale dans des corpus audiovisuels à l'aide de représentations auto-supervisées Stage de fin d’études d’Ingénieur ou de Master 2 – Année académique 2023-2024 


    Mots clés : deep learning, machine learning, self supervised models, voice activity detection, speech activity detection, wav2vec 2.0 Contexte L’Institut National de l’Audiovisuel (INA) est un établissement public à caractère industriel et commercial (EPIC), dont la mission principale consiste à sauvegarder et promouvoir le patrimoine audiovisuel français à travers la vente d’archives et la gestion du dépôt légal. À ce titre, l’Institut capte en continu 180 chaînes de télévision et radio et stocke plus de 25 millions d’heures de contenu audiovisuel. L’INA assure également des missions de formation, de production et de recherche scientifique. Le service de la recherche de l’INA mène depuis plus de 20 ans des travaux de recherche dans le domaine de l’indexation et de la description automatique de ces fonds selon l’ensemble des modalités : textes, sons et images. Le service participe à de nombreux projets collaboratifs de recherche que ce soit dans un cadre national et européen et accueille des stages de Master ainsi que des doctorants en co-encadrement avec des laboratoires nationaux d’excellence. Ce stage est proposé au sein de l’équipe de recherche ( et se place dans le cadre d’un projet collaboratif financé par l’ANR : Gender Equality Monitor (GEM). D’autres sujets de stage sont également proposés dans l’équipe :

    Objectifs du stage La détection d’activité vocale (Voice Activity Detection - VAD) est une tâche d’analyse audio qui vise à identifier les portions d’enregistrement contenant de la parole humaine, les distinguant des autres parties du signal contenant du silence, des bruits de fond ou de la musique. Souvent considérée comme un prétraitement, cette méthode utilisée en amont des tâches de reconnaissance automatique de la parole, des locuteurs ou des émotions. Si les outils VAD existants permettent d’obtenir d’excellents résultats sur les programmes d’information ou les émissions de plateau [Dou18a, Bre23], les recherches récentes menées à l’INA ont révélé que les performances des systèmes état-de-l’art sont moindres pour un grand nombre de matériaux peu représentés dans les corpus de parole annotés. Ces contenus, qui ont fait l’objet d’une campagne d’annotation interne, incluent des émissions musicales, des dessins animés, du sport, des fictions, des jeux télévisés et des documentaires. L'objectif du stage est de développer des modèles de détection d'activité vocale (VAD) en adoptant une approche fondée sur le paradigme d'apprentissage auto-supervisé et s’appuyant sur les architectures transformerstelles que wav2vec 2.0 [Bae20]. Les modèles basés sur ces architectures permettent d’obtenir des résultats état de l'art sur de nombreuses tâches de traitement de la parole à l’aide de quantités d’exemples annotés limitées : transcription, compréhension, traduction, détection d'émotions, reconnaissance de locuteur, détection du langage, etc [Li22, Huh23, Par23]. Plusieurs études récentes ont démontré l’efficacité des approches auto-supervisées pour la VAD [Gim21, Kun23], mais ont à ce jour été entraînées et évaluées sur des données ne reflétant pas la diversité des contenus audiovisuels. Le stage proposé vise à exploiter les millions d'heures de contenu audiovisuel conservés à l’INA pour l'entraînement et l’amélioration des modèles. Les modèles réalisés seront intégrés au logiciel open-source inaSpeechSegmenter, utilisé entre autres pour le décompte du temps de parole des femmes et des hommes dans les programmes à des fins de recherche ou de régulation du paysage audiovisuel [Dou18b, Arc23].

    Valorisation du stage Différentes stratégies de valorisation des travaux seront envisagées, en fonction de leur degré de maturité et des orientations envisagées pour la suite des travaux :

    ● Diffusion des modèles réalisés sous licence open-source sur HuggingFace et/ou le dépôt Github de l’INA :

    ● Rédaction de publications scientifiques

    Conditions du stage Le stage se déroulera sur une période de 4 à 6 mois, au sein du service de la Recherche de l’Ina. Il aura lieu sur le site Bry 2, situé au 28 Avenue des frères Lumière, 94360 Bry-sur-Marne.La·le stagiaire sera encadré·e par Valentin Pelloin et David Doukhan. Un ordinateur équipé d’un GPU sera fourni ainsi qu’un accès au cluster de calcul de l’Institut. Gratification : 760 € brut / mois + 50 % pass navigo

    Télétravail : possible une journée par semaine

    Contact Pour soumettre votre candidature à ce stage, ou pour solliciter davantage d’informations, nous vous invitons à envoyer votre CV et votre lettre de motivation par e-mail aux adresses suivantes : et Profil recherché ● Étudiant·e en dernière année d’un bac +5 dans le domaine de l’informatique et de l'IA

    ● Forte appétence pour la recherche académique

    ● Intérêt pour le traitement automatique de la parole

    ● Maîtrise de Python et expérience dans l’utilisation de bibliothèques de ML

    ● Capacité à effectuer des recherches bibliographiques ● Rigueur, Synthèse, Autonomie, Capacité à travailler en équipe


    [Arc23] ARCOM (2023). “La représentation des femmes à la télévision et à la radio - Rapport sur l'exercice 2022” [en ligne].

    [Bae20] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” Neural Information Processing Systems, Jun. 2020.

    [Bre23] Bredin, H. (2023). 2.1 speaker diarization pipeline: principle, benchmark, and recipe, in INTERSPEECH 2023, ISCA, pp. 1983–1987.

    [Dou18a] Doukhan, D., Carrive, J., Vallet, F., Larcher, A., & Meignier, S. (2018, April). An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5214-5218). IEEE.

    [Dou18b] Doukhan, D., Poels, G., Rezgui, Z., & Carrive, J. (2018). Describing gender equality in french audiovisual streams with a deep learning approach. VIEW Journal of European Television History and Culture, 7(14), 103-122.

    [Gim21] P. Gimeno, A. Ortega, A. Miguel, and E. Lleida, “Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021,” in Interspeech 2021, ISCA, Aug. 2021, pp. 4359–4363.

    [Huh23] Huh, J., Brown, A., Jung, J. W., Chung, J. S., Nagrani, A., Garcia-Romero, D., & Zisserman, A. (2023). Voxsrc 2022: The fourth voxceleb speaker recognition challenge. arXiv preprint arXiv:2302.10248.

    [Kun23] M. Kunešová and Z. Zajíc, “Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.

    [Li22] Li, M., Xia, Y., & Lin, F. (2022, December). Incorporating VAD into ASR System by Multi-task Learning. In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 160-164). IEEE.

    [Par23] Parcollet, T., Nguyen, H., Evain, S., Boito, M. Z., Pupier, A., Mdhaffar, S., ... & Besacier, L. (2023). LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech. arXiv preprint arXiv:2309.05472.

  • 2024-01-04 15:21 | Anonymous

    L’équipe SAMoVA de l’IRIT à Toulouse propose plusieurs stages (M1, M2, PFE ingénieur) en 2024 autour des thématiques suivantes (liste non exhaustive) :


    - Génération Automatique De Partitions Musicales Dans Le Style Choro

    - Compréhension De La Parole Et IA Au Service De L’Analyse Sensorielle

    - Caractérisation Du Comportement Alimentaire Par Des Analyses Vidéo Et Multimodale

    - Adaptations De Systèmes De Reconnaissance Automatique De Parole En Contexte Pathologique

    - Traitement De Signal Et IA Pour Révéler Des Troubles Articulatoires En Production De Parole Atypique

    - End-To-End Speech Recognition For Assessing Comprehension Skills Of Children Learning To Read

    - Active Learning For Speaker Diarization

    - Modélisation Automatique Du Rythme De La Parole

    - Transcription de Verbalisations pour l’Analyse du Discours lors de Scénarios en Réalité Virtuelle

    - Mise en œuvre d’un prototype de reconnaissance vocale comparative appliqué à l’apprentissage du langage oral


    Tous les détails (sujets, contacts) sont disponibles dans la section 'Jobs' de l’équipe :
  • 2024-01-04 15:20 | Anonymous

    Offre Post-doc – Linguistique / linguistique computationnelle 


    Durée :            9 mois

    Début :            janvier ou février 2024, un début au mois de mars 2024 est négociable

    Lieu :               LIUM – Le Mans Université

    Salaire net :     environ 2 000 €/mois, variable selon les compétences

    Contact :,

    Candidature :  Lettre de motivation, CV (3 pages maximum)

    Dans le cadre du projet DIETS qui s’intéresse particulièrement aux métriques d’évaluation de systèmes automatiques de reconnaissance de la parole, une position post-doc est prévue pour 

    a)     Mener une analyse linguistique et grammaticale sur les erreurs de sorties de systèmes automatiques de reconnaissance de la parole

    b)    Mener des tests d’évaluation humaine en fonction de différents types d’erreurs 

    c)     Comparer les choix des tests d’évaluation avec les évaluations faites par des métriques automatiques

    d)    Publication des résultats (conférences, journaux)



    Le projet DIETS


    L'un des problèmes majeurs des mesures d'évaluation du traitement des langues est qu'elles sont conçues pour mesurer globalement une solution proposée par rapport à une référence considérée, l'objectif principal étant de pouvoir comparer les systèmes entre eux. Le choix des mesures d'évaluation utilisées est très souvent crucial puisque les recherches entreprises pour améliorer ces systèmes sont basées sur ces mesures. Alors que les systèmes automatiques, comme la transcription de la parole, s'adressent à des utilisateurs finaux, ils sont finalement peu étudiés : l'impact de ces erreurs automatiques sur les humains, et la manière dont elles sont perçues au niveau cognitif, n'ont pas été étudiés, puis finalement intégrés dans le processus d'évaluation.


    Le projet DIETS, financé par l'Agence Nationale de la Recherche (2021-2024) et porté par le Laboratoire Informatique d'Avignon, propose de se focaliser sur la problématique du diagnostic/évaluation des systèmes de reconnaissance automatique de la parole (RAP) de bout en bout, basés sur des architectures de réseaux de neurones profonds, en intégrant la réception humaine des erreurs de transcription d'un point de vue cognitif. Le défi est ici double :


        1) Analyser finement les erreurs de RAP à partir d'une réception humaine.


        2) Comprendre et détecter comment ces erreurs se manifestent dans un cadre ASR de bout en bout, dont le travail est inspiré par le fonctionnement du cerveau humain.


    Le projet DIETS vise à repousser les limites actuelles concernant la compréhension des systèmes ASR de bout en bout, et à initier de nouvelles recherches intégrant une approche transversale (informatique, linguistique, sciences cognitives...) en replaçant l'humain au centre du développement des systèmes automatiques.



    Compétences requises 


    L’offre de poste requiert les compétences suivantes : une bonne maîtrise de l’orthographe et de la grammaire française nécessaires pour catégoriser d’une manière informée les erreurs de différents systèmes de transcription et des compétences numériques puisqu’il faudrait récupérer les données à partir d’un serveur. Une formation en linguistique ou linguistique computationnelle est souhaitée. 

    Une expérience dans l’organisation, la réalisation et l’analyse de tests comportementaux est un plus. 


    Lieu d’accueil 


    La structure d’accueil est le LIUM, laboratoire d’informatique de Le Mans Université situé au Mans. Une présence régulière au laboratoire est requise tout au long du Post-doc. Le LIUM est composé de deux équipes. Le post-doc se déroulera dans l’équipe LST qui développe ses activités de recherche dans le domaine du traitement automatique des langues naturelles sous forme de texte et de parole. Elle travaille avec des approches guidées par les données mais l'équipe est également spécialisée dans le deep learningappliqué au traitement des langues. L’équipe est actuellement composée d’une chargée de projets, de 11 enseignants-chercheurs (informaticiens, acousticiens, linguistes), de 4 chercheurs-doctorants et de deux masterants apprentis.

  • 2024-01-04 15:19 | Anonymous

    Postdoctoral Scholar | Data Sciences and Artificial Intelligence at Penn State University

    The Data Sciences and Artificial Intelligence (DS/AI) group at Penn State invites applications for a Postdoctoral Scholar position, set to commence in Fall 2024. This role is centered on cutting-edge research at the nexus of machine learning, deep learning, computer vision, psychology, and biology, with foci on psychology-inspired AI and addressing significant biological questions using AI.

    To Apply:


    • Ph.D. in computer science, A.I., data science, physics, or neuroscience with an emphasis on machine learning, or a closely related field. To qualify, candidates must possess a Ph.D. or terminal degree before their employment starts at Penn State.

    • A strong record of publications in high-impact journals or premier peer-reviewed international conferences.

    • Prior experience in conducting interdisciplinary/multidisciplinary research is a plus.


    About the position:

    The successful candidate will be designated as a Postdoctoral Scholar at the College of Information Sciences and Technology (IST) of The Pennsylvania State University. The initial term of the position is for one year, with the possibility of renewal upon performance and fund availability. The scholar will be engaged in two interdisciplinary projects funded by the National Science Foundation, receiving mentorship from Professors James Wang (IST), Brad Wyble (Psychology), and Charles Anderson (Biology). The scholar will collaborate with highly motivated and talented graduate students and benefit from strong career development support, which includes training in teaching, grant proposal writing, and other collaborative work. Qualified candidates will have the ability to teach in IST after successfully completing one semester with approval from college leadership.


    To apply:

    • Please submit a CV, research statement (max 3 pages), and other pertinent documents in a single PDF document with the application.

    • Deadline: February 29, 2024, for full consideration. Late applications are accepted but given secondary priority.

    • Only shortlisted candidates will be contacted to provide reference letters.

    • For inquiries, please email with the subject line “postdoc” to Professor James Wang at or visit the lab website



    The College of IST is strongly committed to a diverse community and to providing a welcoming and inclusive environment for faculty, staff and students of all races, genders, and backgrounds. The College of IST is committed to making good faith efforts to recruit, hire, retain, and promote qualified individuals from underrepresented minority groups including women, persons of color, diverse gender identities, individuals with disabilities, and veterans. We invite applicants to address their engagement in or commitment to inclusion, equity, and diversity issues as they relate to broadening participation in the disciplines represented in the college as well as aligning with the mission of the College of IST in a separate statement.



    Pursuant to the Jeanne Clery Disclosure of Campus Security Policy and Campus Crime Statistics Act and the Pennsylvania Act of 1988, Penn State publishes a combined Annual Security and Annual Fire Safety Report (ASR). The ASR includes crime statistics and institutional policies concerning campus security, such as those concerning alcohol and drug use, crime prevention, the reporting of crimes, sexual assault, and other matters. The ASR is available for review here.


    Employment with the University will require successful completion of background check(s) in accordance with University policies. 



    Penn State is an equal opportunity, affirmative action employer, and is committed to providing employment opportunities to all qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. If you are unable to use our online application process due to an impairment or disability, please contact 814-865-1473.

  • 2024-01-04 15:18 | Anonymous

    Senior Data Scientist at the University of Chicago


    Please apply at


    About the Department

    The TMW Center for Early Learning + Public Health (TMW Center) develops science-based interventions, tools, and technologies to help parents and caregivers interact with young children in ways that maximize brain development. A rich language environment is critical to healthy brain development, however few tools exist to measure the quality or quantity of these environments. Access to this type of data allows caregivers to enhance interactions in real-time and gives policy-makers insight in how to best build policies that have a population-level impact.

    The wearable team within TMW Center is building a low-cost wearable device that can reliably and accurately measure a child’s early language environment vis-à-vis the conversational turns between a child and caregiver. The goal is to provide accurate, real-time feedback that empowers parents and caregivers to create the best language environment for their children.

    Job Summary

    The job works independently to perform a variety of activities relating to software support and/or development. Analyzes, designs, develops, debugs, and modifies computer code for end user applications, beta general releases, and production support. Guides development and implementation of applications, web pages, and user-interfaces using a variety of software applications, techniques, and tools. Solves complex problems in administration, maintenance, integration, and troubleshooting of code and application ecosystem currently in production.

    We are searching for a strategic and inquisitive senior data scientist to develop and optimize innovative AI-based models focused on speech/audio processing. The senior data scientist is expected to outline requirements, brainstorm ideas and solutions with leadership, manage data integrity and conduct experiments, assign tasks to junior staff, and monitor performance of the team.



    • Formulates, suggests, and manages data-driven projects to support the development of audio algorithms and use cases.
    • Analyzes data from various entities for later use by junior data scientists.
    • Assesses scope and timelines, prioritize goals, and prepare project plans to meet product and research objectives.
    • Delegates tasks to junior data scientists and provide coaching to improve quality of work.
    • Continuously trains and nurtures data scientists to take on bigger assignments.
    • Provides leadership in advancing the science of TMW Center interventions by generating new ideas and collaborating with the research analysis team.
    • In collaboration with CTO, selects and guides decisions on statistical procedures and model selections, including conducting exploratory experiments to develop proof of concept.
    • Cross-validate models to ensure generalization and predictability.
    • Stays informed about developments in Data Science and adjacent fields to ensure most relevant methods and outputs are being leveraged.
    • Ensures data governance is in place to comply with regulations and privacy standards and maintain documentation of methodologies, coding, and results.
    • Designs new systems, features, and tools. Solves complex problems and identifies opportunities for technical improvement and performance optimization. Reviews and tests code to ensure appropriate standards are met.
    • Utilizes technical knowledge of existing and emerging technologies, including public cloud offerings from Amazon Web Services, Microsoft Azure, and Google Cloud.
    • Acts as a technical consultant and resource for faculty research, teaching, and/or administrative projects.
    • Performs other related work as needed.

    Minimum Qualifications


    Minimum requirements include a college or university degree in related field.

    Work Experience:

    Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline.



    Preferred Qualifications


    • Master’s degree in Computer Science, Statistics, Mathematics, or Economics with a focus on computer science.


    • Experience with Machine Learning and LLMs.
    • Experience working on audio or speech data.
    • Experience implementing edge models using TensorFlow micro, TensorFlow lite, and corresponding quantization techniques.
    • Experience building audio classification models or speech to text models.
    • Experience using the latest pre-trained models such as whisper and wav2vec.
    • Proven experience taking an idea or user need and translating it into fully realized applications.
    • Ability to relay insights in layman’s terms to inform business decisions. 
    • 3+ years leading and managing junior data scientists.

    Technical Skills or Knowledge:

    • Proficiency in Python, Pytorch, Tensorflow, TinyML, Pandas and Numpy.
    • Experience with cloud environments such as AWS, Azure or GCloud.
    • Experience with command line interfaces (Linux, SSH).
    • Experience processing large datasets with Spark, Dask or Ray.

    Application Documents

    • Resume (required)
    • Cover Letter (preferred)

    When applying, the document(s) MUST be uploaded via the My Experience page, in the section titled Application Documents of the application.

  • 2024-01-04 15:17 | Anonymous

    Title: Predictive Modeling of Subjective Disagreement in Speech Annotation/Evaluation Host laboratory : LIUM

    Location : Le Mans

    Supervisors : Meysam Shamsi, Anthony Larcher

    Beginning of internship : February 2024

    Application deadline : 10/01/2024

    Keywords: Subjective Disagreement Modeling, Synthetic Speech Quality Evaluation, Speech Emotion Recognition In the context of modeling subjective tasks, where diverse opinions, perceptions, and judgments exist among individuals, such as in speech quality or speech emotion recognition, addressing the challenge of defining ground truth and annotating a training set becomes crucial. The current practice of aggregating all annotations into a single label for modeling a subjective task is neither fair nor efficient [1]. The variability in annotations or evaluations can stem from various factors [2], broadly categorized into those associated with corpus quality and those intrinsic to the samples themselves. In the first case, the delicate definition of a subjective task introduces sensitivity into the annotation process, potentially leading to more errors, especially where the annotation tools and platform lack precision or annotators experience fatigue. In the second case, the inherent ambiguity in defining a subjective task and different perception may result in varying annotations and disagreements. Developing a predictive model to understand annotator/evaluator disagreement is crucial for engaging in discussions related to ambiguous samples and refining the definition of subjective concepts. Furthermore, this model can serve as a valuable tool for assessing the confidence of automatic evaluations [3,4]. This modeling approach will contribute to the automatic evaluation of corpus annotations, identification of ambiguous samples for reconsideration or re-annotation, automatic assessment of subjective models, and the detection of underrepresented samples and biases in the dataset. The proposed research involves utilizing a speech dataset such as MS-Podcast [5], SOMOS [6], VoiceMOS [7], for a subjective task with multiple annotations per sample. The primary objective is to predict the variation in assigned labels, measured through disagreement scores, entropy, or distribution.

    Reference: [1]. Davani, A. M., Díaz, M., & Prabhakaran, V. (2022). Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10, 92-110.

    [2]. Kreiman, J., Gerratt, B. R., & Ito, M. (2007). When and why listeners disagree in voice quality assessment tasks. The Journal of the Acoustical Society of America, 122(4), 2354-2364.

    [3]. Wu, W., Chen, W., Zhang, C., & Woodland, P. C. (2023). It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation. arXiv preprint arXiv:2310.00486.

    [4]. Han, J., Zhang, Z., Schmitt, M., Pantic, M., & Schuller, B. (2017, October). From hard to soft: Towards more human-like emotion recognition by modelling the perception uncertainty. In Proceedings of the 25th ACM international conference on Multimedia (pp. 890-897).

    [5]. Lotfian, R., & Busso, C. (2017). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4), 471-483.

    [6]. Maniati, G., Vioni, A., Ellinas, N., Nikitaras, K., Klapsas, K., Sung, J.S., Jho, G., Chalamandaris, A., Tsiakoulis, P. (2022) SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. Proc. Interspeech 2022, 2388-2392 [7]. Cooper, E., Huang, W. C., Tsao, Y., Wang, H. M., Toda, T., & Yamagishi, J. (2023). The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains. arXiv preprint arXiv:2310.02640.

    Applicant profile : Candidate motivated by artificial intelligence, enrolled in a Master's degree in Computer Science or related fields

    For application: Send CV + cover letter to : or before 10/01/2024

<< First  < Prev   1   2   Next >  Last >> 
 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by Wild Apricot Membership Software