ISCA Archive Interspeech 2014 Sessions Booklet
  ISCA Archive Sessions Booklet

Interspeech 2014

14-18 September 2014

General Chair: Haizhou Li; General Co-Chair: Pak-Chung Ching
doi: 10.21437/Interspeech.2014

Phonetics and Phonology 1, 2

Acoustic correlates of phonological status
Maarten Versteegh, Amanda Seidl, Alejandrina Cristia

Parameterization of the glottal source with the phase plane plot
Manu Airaksinen, Paavo Alku

Transcribing tone — a likelihood-based quantitative evaluation of chao's tone letters
Phil Rose

Intonational phonology and prosodic hierarchy in malay
Diyana Hamzah, James Sneed German

Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian
Uwe D. Reichel, Katalin Mády

An evaluation of machine learning methods for prominence detection in French
George Christodoulides, Mathieu Avanzi

Investigating the effect of F0 and vocal intensity on harmonic magnitudes: data from high-speed laryngeal videoendoscopy
Gang Chen, Soo Jin Park, Jody Kreiman, Abeer Alwan

Adapting prosodic chunking algorithm and synthesis system to specific style: the case of dictation
Elisabeth Delais-Roussarie, Damien Lolive, Hiyon Yoo, Nelly Barbot, Olivier Rosec

The articulation of lexical and post-lexical palatalization in Korean
Jae-Hyun Sung

Articulation and neutralization: a preliminary study of lenition in scottish gaelic
Diana Archangeli, Samuel Johnston, Jae-Hyun Sung, Muriel Fisher, Michael Hammond, Andrew Carnie

Nasality in speech and its contribution to speaker individuality
Kanae Amino, Hisanori Makinae, Tatsuya Kitamura

Is speech rhythm an intrinsic property of language?
Jason Brown, Eden Matene

Where /ar/ the /r/s in standard austrian German?
Anke Jackschina, Barbara Schuppler, Rudolf Muhr

Diphthongized vowels in the yi county hui Chinese dialect
Fang Hu, Minghui Zhang

Rhythmic variability between some asian languages: results from an automatic analysis of temporal characteristics
Volker Dellwo, Peggy Mok, Mathias Jenny

Listener estimation of speaker age based on whispered speech
Angelika Braun, Daniela Decker

The Lombard effect with Thai lexical tones: an acoustic analysis of articulatory modifications in noise
Benjawan Kasisopa, Virginie Attina, Denis Burnham

Speech Production: Models and Acoustics

Motor control primitives arising from a learned dynamical systems model of speech articulation
Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan

Nonword repetition of taiwanese disyllabic tonal sequences in adults with language attrition
Chia-Hsin Yeh, Chiung-Yao Wang, Jung-Yueh Tu

A unified account of prominence effects in an optimization-based model of speech timing
Andreas Windmann, Juraj Šimko, Petra Wagner

Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints
Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan

Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: benefit to speech recognition
Prasad Sudhakar, Prasanta Kumar Ghosh

Contribution of tongue lateral to consonant production
Jun Wang, William Katz, Thomas F. Campbell

A preliminary study on acoustic correlates of tone2+tone2 disyllabic word stress in Mandarin
Min Liu, Shuju Shi, Jinsong Zhang

Vowel length impact on locus equation parameters: an investigation on jordanian Arabic
Mohammad Abuoudeh, Olivier Crouzet

Corpus-testing a fricative discriminator; or, just how invariant is this invariant?
Philip J. Roberts, Henning Reetz, Aditi Lahiri

Modeling coarticulation in continuous speech
Brian O. Bush, Alexander Kain

On classification between normal and pathological voices using the MEEI-kayPENTAX database: issues and consequences
Khalid Daoudi, Blaise Bertrac

Synchronic variation in the articulation and the acoustics of the Polish three-way place distinction in sibilants and its implications for diachronic change
Véronique Bukmaier, Jonathan Harrington, Ulrich Reubold, Felicitas Kleber

Spoken Language Understanding

Theme identification in human-human conversations with features from specific speaker type hidden spaces
Mohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linarès, Renato De Mori

Learning phrase patterns for text classification using a knowledge graph and unlabeled data
Alex Marin, Roman Holenstein, Ruhi Sarikaya, Mari Ostendorf

Targeted feature dropout for robust slot filling in natural language understanding
Puyang Xu, Ruhi Sarikaya

Spoken question answering using tree-structured conditional random fields and two-layer random walk
Sz-Rung Shiang, Hung-yi Lee, Lin-shan Lee

Shrinkage based features for slot tagging with conditional random fields
Ruhi Sarikaya, Asli Celikyilmaz, Anoop Deoras, Minwoo Jeong

Cluster based Chinese abbreviation modeling
Yangyang Shi, Yi-Cheng Pan, Mei-Yuh Hwang

Parsing named entity as syntactic structure
Xiantao Zhang, Dongchen Li, Xihong Wu

Detecting out-of-domain utterances addressed to a virtual personal assistant
Gokhan Tur, Anoop Deoras, Dilek Hakkani-Tür

Fusion of knowledge-based and data-driven approaches to grammar induction
Spiros Georgiladakis, Christina Unger, Elias Iosif, Sebastian Walter, Philipp Cimiano, Euripides Petrakis, Alexandros Potamianos

Improving named entity recognition with prosodic features
Denys Katerenchuk, Andrew Rosenberg

Neural network models for lexical addressee detection
Suman V. Ravuri, Andreas Stolcke

Manipulating stance and involvement using collaborative tasks: an exploratory comparison
Valerie Freeman, Julian Chan, Gina-Anne Levow, Richard Wright, Mari Ostendorf, Victoria Zayats

Speech Production I, II

Automatic estimation of the lip radiation effect in glottal inverse filtering
Manu Airaksinen, Tom Bäckström, Paavo Alku

Simulation of 3d larynges with asymmetric distribution of viscoelastic properties in their vocal folds
Marcelo de Oliveira Rosa

Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods
Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura

A study of invariant properties and variation patterns in the converter/distributor model for emotional speech
Jangwon Kim, Donna Erickson, Sungbok Lee, Shrikanth S. Narayanan

A hybrid approach to 3d tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation
Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer

Estimation of vocal-tract shape from speech spectrum and speech resynthesis based on a generative model
Tokihiko Kaburagi

A real-time MRI study of articulatory setting in second language speech
Andrés Benítez, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan

Retroflex and bunched English /r/ with physical models of the human vocal tract
Takayuki Arai

Parameterization of articulatory pattern in speakers with ALS
Panying Rong, Yana Yunusova, James D. Berry, Lorne Zinman, Jordan R. Green

Missing samples estimation in electromagnetic articulography data using equality constrained kalman smoother
Sujith P, Prasanta Kumar Ghosh

Palate-referenced articulatory features for acoustic-to-articulator inversion
An Ji, Michael T. Johnson, Jeff Berry

A study on the improvement of measurement accuracy of the three-dimensional electromagnetic articulography
Hidetsugu Uchida, Kohei Wakamiya, Tokihiko Kaburagi

INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)

The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load
Björn Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, Yue Zhang

Filtering and subspace selection for spectral features in detecting speech under physical stress
Jouni Pohjalainen, Paavo Alku

Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens
Ming Li

Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction
Heysem Kaya, Tuğçe Özkaptan, Albert Ali Salah, Sadık Fikret Gürgen

Ensemble of machine learning algorithms for cognitive and physical speaker load detection
How Jing, Ting-Yao Hu, Hung-Shin Lee, Wei-Chen Chen, Chi-Chun Lee, Yu Tsao, Hsin-Min Wang

Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks
Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth

High-level speech event analysis for cognitive load classification
Claude Montacié, Marie-José Caraty

On the use of Bhattacharyya based GMM distance and neural net features for identification of cognitive load levels
Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma

Prediction of cognitive load from speech with the VOQAL voice quality toolbox for the interspeech 2014 computational paralinguistics challenge
Mark Huckvale

The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge
Jia Min Karen Kua, Vidhyasaharan Sethu, Phu Le, Eliathamby Ambikairajah

Classification of cognitive load from speech using an i-vector framework
Maarten Van Segbroeck, Ruchir Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, Shrikanth S. Narayanan

Hearing and Perception

Revisiting the right-ear advantage for speech: implications for speech displays
Nandini Iyer, Eric Thompson, Brian Simpson, Griffin Romigh

Comparing reaction time sequences from human participants and computational models
L. ten Bosch, Miriam Ernestus, Lou Boves

Detecting the number of competing speakers — human selective hearing versus spectrogram distance based estimator
Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu

The influence of sensory memory and attention on the context effect in talker normalization
Guo Li, Gang Peng

Automatic speech recognition with primarily temporal envelope information
Payton Lin, Fei Chen, Syu Siang Wang, Ying-Hui Lai, Yu Tsao

An adaptive envelope compression strategy for speech processing in cochlear implants
Ying-Hui Lai, Fei Chen, Yu Tsao

Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBI
Brian S. Helfer, Thomas F. Quatieri, James R. Williamson, Laurel Keyes, Benjamin Evans, W. Nicholas Greene, Trina Vian, Joseph Lacirignola, Trey Shenk, Thomas Talavage, Jeff Palmer, Kristin Heaton

A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics
Nozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura

Investigation of the relative perceptual importance of temporal envelope and temporal fine structure between tonal and non-tonal languages
Dongmei Wang, James M. Kates, John H. L. Hansen

Vowel spectral contributions to English and Mandarin sentence intelligibility
Daniel Fogerty, Fei Chen

Significance of aperiodicity in the pitch perception of expressive voices
Vinay Kumar Mittal, B. Yegnanarayana

Cross-Linguistic Studies

DIAPIX-FL: a symmetric corpus of problem-solving dialogues in first and second languages
Mirjam Wester, María Luisa García Lecumberri, Martin Cooke

Cross-linguistic investigations of oral and silent reading
Christophe Coupé, Yoon Mi Oh, François Pellegrino, Egidio Marsico

Non-native word recognition in noise: the role of word-initial and word-final information
Juul Coumans, Roeland van Hout, Odette Scharenborg

The effects of high and low variability phonetic training on the perception and production of English vowels /e/-/æ/ by Cantonese ESL learners with high and low L2 proficiency levels
Janice Wing Sze Wong

Dutch vowel production by Spanish learners: duration and spectral features
Pepi Burgos, Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik

English consonant confusions by Greek listeners in quiet and noise and the role of phonological short-term memory
Angelos Lengeris, Katerina Nicolaidis

Corpus-based L2 phonological data and semi-automatic perceptual analysis: the case of nasal vowels produced by beginner Japanese learners of French
Sylvain Detey, Isabelle Racine, Julien Eychenne, Yuji Kawaguchi

Perception of prosodic prominence and boundaries by L1 and L2 speakers of English
Gábor Pintér, Shinobu Mizuguchi, Koichi Tateishi

Prosody perception, reading accuracy, nonliteral language comprehension, and music and tonal pitch discrimination in school aged children
Rose Thomas Kalathottukaren, Suzanne C. Purdy, Elaine Ballard

Phoneme category retuning in a non-native language
Polina Drozdova, Roeland van Hout, Odette Scharenborg

Speech emotion recognition with cross-lingual databases
Bo-Chang Chiou, Chia-Ping Chen

Speaker Diarization

Speaker diarization using eye-gaze information in multi-party conversations
Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Tatsuya Kawahara

Unsupervised speaker diarization using riemannian manifold clustering
Che-Wei Huang, Bo Xiao, Panayiotis G. Georgiou, Shrikanth S. Narayanan

Towards a complete binary key system for the speaker diarization task
Héctor Delgado, Corinne Fredouille, Javier Serrano

An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archives
Houman Ghaemmaghami, David Dean, Sridha Sridharan

Speaker diarization using gesture and speech
Binyam Gebrekidan Gebre, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts, Tom Heskes

Is incremental cross-show speaker diarization efficient for processing large volumes of data?
Grégor Dupuy, Sylvain Meignier, Yannick Estève

Detecting and labeling speakers on overlapping speech using vector taylor series
Pranay Dighe, Marc Ferràs, Hervé Bourlard

Phoneme background model for information bottleneck based speaker diarization
Sree Harsha Yella, Petr Motlicek, Hervé Bourlard

Diarizing large corpora using multi-modal speaker linking
Marc Ferràs, Stefano Masneri, Oliver Schreer, Hervé Bourlard

Multimodal understanding for person recognition in video broadcasts
Frederic Bechet, Meriem Bendris, Delphine Charlet, Géraldine Damnati, Benoit Favre, Mickael Rouvier, Remi Auguste, Benjamin Bigot, Richard Dufour, Corinne Fredouille, Georges Linarès, Jean Martinet, Gregory Senay, Pierre Tirilly

Robust ASR 1, 2

Comparing time-frequency representations for directional derivative features
James Gibson, Maarten Van Segbroeck, Shrikanth S. Narayanan

Robust speech recognition with speech enhanced deep neural networks
Jun Du, Qing Wang, Tian Gao, Yong Xu, Li-Rong Dai, Chin-Hui Lee

An investigation of likelihood normalization for robust ASR
Emmanuel Vincent, Aggelos Gkiokas, Dominik Schnitzer, Arthur Flexer

Identifying the human-machine differences in complex binaural scenes: what can be learned from our auditory system
Constantin Spille, Bernd T. Meyer

Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling
Jürgen T. Geiger, Zixing Zhang, Felix Weninger, Björn Schuller, Gerhard Rigoll

Joint adaptation and adaptive training of TVWR for robust automatic speech recognition
Shilin Liu, Khe Chai Sim

Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression
Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim, Richard M. Stern

Variable-component deep neural network for robust speech recognition
Rui Zhao, Jinyu Li, Yifan Gong

Effective modulation spectrum factorization for robust speech recognition
Yu-Chen Kao, Yi-Ting Wang, Berlin Chen

Hybrid MLP/structured-SVM tandem systems for large vocabulary and robust ASR
Suman V. Ravuri

Robust speech recognition using temporal masking and thresholding algorithm
Chanwoo Kim, Kean K. Chin, Michiel Bacchiani, Richard M. Stern

Deep neural network bottleneck features for generalized variable parameter HMMs
Xurong Xie, Rongfeng Su, Xunying Liu, Lan Wang

A novel dynamic parameters calculation approach for model compensation
Suliang Bu, Yanmin Qian, Kai Yu

Speech recognition based on Itakura-Saito divergence and dynamics/sparseness constraints from mixed sound of speech and music by non-negative matrix factorization
Naoaki Hashimoto, Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

Noise robust speech recognition based on noise-adapted HMMs using speech feature compensation
Yong-Joo Chung

Noise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition
M. J. Alam, Patrick Kenny, Pierre Dumouchel, Douglas O'Shaughnessy

Speech Synthesis I-III

Using conditional random fields to predict focus word pair in spontaneous spoken English
Xiao Zang, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai

Applications of maximum entropy rankers to problems in spoken language processing
Richard Sproat, Keith Hall

Text-to-speech with cross-lingual neural network-based grapheme-to-phoneme models
Xavi Gonzalvo, Monika Podsiadło

Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis
Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi

Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages
B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan

An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis
Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi, Javier Latorre

Chaotic mixed excitation source for speech synthesis
Hemant A. Patil, Tanvina B. Patel

Refined inter-segment joining in multi-form speech synthesis
Alexander Sorin, Slava Shechtman, Vincent Pollet

A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system
Ran Zhang, Zhengqi Wen, Jianhua Tao, Ya Li, Bing Liu, Xiaoyan Lou

Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression
Diandra Fabre, Thomas Hueber, Pierre Badin

Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models
Patrick Lumban Tobing, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura, Ayu Purwarianti

Speech-driven head motion synthesis using neural networks
Chuang Ding, Pengcheng Zhu, Lei Xie, Dongmei Jiang, Zhong-Hua Fu

Text-independent voice conversion using speaker model alignment method from non-parallel speech
Peng Song, Yun Jin, Wenming Zheng, Li Zhao

Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes
Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai

Hierarchical modeling of F0 contours for voice conversion
Gerard Sanchez, Hanna Silen, Jani Nurminen, Moncef Gabbouj

Speech prosody generation for text-to-speech synthesis based on generative model of F0 contours
Kento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo, Hirokazu Kameoka

An iterative approach to decision tree training for context dependent speech synthesis
Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson

Prosodic phrasing modeling for vietnamese TTS using syntactic information
Thi Thu Trang Nguyen, Albert Rilliard, Do Dat Tran, Christophe d'Alessandro

Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling
Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi

Reconstruction of mistracked articulatory trajectories
Qiang Fang, Jianguo Wei, Fang Hu

Enabling controllability for continuous expression space
Langzhou Chen, Norbert Braunschweiler

Analysis of spectral enhancement using global variance in HMM-based speech synthesis
Takashi Nose, Akinori Ito

Intelligibility analysis of fast synthesized speech
Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, Junichi Yamagishi

Speech synthesis reactive to dynamic noise environmental conditions
Susana Palmaz López-Peláez, Robert A. J. Clark

Partial representations improve the prosody of incremental speech synthesis
Timo Baumann

Dialogue context sensitive speech synthesis using factorized decision trees
Pirros Tsiakoulis, Catherine Breslin, M. Gašić, Matthew Henderson, Dongho Kim, Steve Young

Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis
Xin Wang, Zhen-Hua Ling, Li-Rong Dai

On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech
Dhananjaya Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Kalle Palomäki, Mircea Giurgiu, Mikko Kurimo

Objective evaluation of HMM-based speech synthesis system using kullback-leibler divergence
C. -T. Do, M. Evrard, A. Leman, Christophe d'Alessandro, Albert Rilliard, J. -L. Crebouw

Speech intonation for TTS: study on evaluation methodology
Javier Latorre, Kayoko Yanagisawa, Vincent Wan, BalaKrishna Kolluru, Mark J. F. Gales

Feature Extraction and Modeling for ASR 1, 2

Acoustic modeling with deep neural networks using raw time signal for LVCSR
Zoltán Tüske, Pavel Golik, Ralf Schlüter, Hermann Ney

Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions
Vikramjit Mitra, Wen Wang, Horacio Franco, Yun Lei, Chris Bartels, Martin Graciarena

Deep scattering spectra with deep neural networks for LVCSR tasks
Tara N. Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran, David Nahamoo

Robust CNN-based speech recognition with Gabor filter kernels
Shuo-Yiin Chang, Nelson Morgan

Probabilistic linear discriminant analysis with bottleneck features for speech recognition
Liang Lu, Steve Renals

Evaluating speech features with the minimal-pair ABX task (II): resistance to noise
Thomas Schatz, Vijayaditya Peddinti, Xuan-Nga Cao, Francis Bach, Hynek Hermansky, Emmanuel Dupoux

Investigating NMF speech enhancement for neural network based acoustic models
Jürgen T. Geiger, Jort F. Gemmeke, Björn Schuller, Gerhard Rigoll

Automatic speech feature classification for children with cochlear implants
Jason Lilley, James Mahshie, H. Timothy Bunnell

Sequential maximum mutual information linear discriminant analysis for speech recognition
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey

Model and feature based compensation for whispered speech recognition
Shabnam Ghaffarzadegan, Hynek Bořil, John H. L. Hansen

Post-masking: a hybrid approach to array processing for speech recognition
Amir R. Moghimi, Bhiksha Raj, Richard M. Stern

ASR feature extraction with morphologically-filtered power-normalized cochleograms
F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín, C. Peláez-Moreno

Should deep neural nets have ears? the role of auditory features in deep learning approaches
Angel Mario Castro Martinez, Niko Moritz, Bernd T. Meyer

Extending Limabeam with discrimination and coarse gradients
Charles Fox, Thomas Hain

Generation of F0 contour using deep boltzmann machine and twin Gaussian process hybrid model for bengali language
Sankar Mukherjee, Shyamal Kumar Das Mandal

Room localization for distant speech recognition
Juan A. Morales-Cordovilla, Hannes Pessentheiner, Martin Hagmüller, Gernot Kubin

Posterior-based sparse representation for automatic speech recognition
Sara Bahaadini, Afsaneh Asaei, David Imseng, Hervé Bourlard

Speech Analysis I, II

Lateral formants in three central australian languages
Marija Tabain, Andrew Butcher, Gavan Breen, Richard Beare

Detecting articulatory compensation in acoustic data through linear regression modeling
Alina Khasanova, Jennifer Cole, Mark Hasegawa-Johnson

The relationship between the second subglottal resonance and vowel class, standing height, trunk length, and F0 variation for Mandarin speakers
Jinxi Guo, Angli Liu, Harish Arsikere, Abeer Alwan, Steven M. Lulich

Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recording
Nisha Meenakshi, Chiranjeevi Yarra, B. K. Yamini, Prasanta Kumar Ghosh

Impact of age in the production of European Portuguese vowels
Luciana Albuquerque, Catarina Oliveira, António Teixeira, Pedro Sa-Couto, João Freitas, Miguel Sales Dias

`houston, we have a solution': a case study of the analysis of astronaut speech during NASA apollo 11 for long-term speaker modeling
Chengzhu Yu, John H. L. Hansen, Douglas W. Oard

Relating automatic vowel space estimates to talker intelligibility
Yi Luan, Richard Wright, Mari Ostendorf, Gina-Anne Levow

Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation
Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino

Sparse time-frequency representation of speech by the vandermonde transform
Christian Fischer Pedersen, Tom Bäckström

Analysis and identification of human scream: implications for speaker recognition
Mahesh Kumar Nandwana, John H. L. Hansen

F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification
Dongmei Wang, Philipos C. Loizou, John H. L. Hansen

The influence of pitch and noise on the discriminability of filterbank features
Malcolm Slaney, Michael L. Seltzer

Speech Processing with Multi-Modalities

Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons
Ahmed Hussen Abdelaziz, Dorothea Kolossa

Lipreading using convolutional neural network
Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata

Lipreading approach for isolated digits recognition under whisper and neutral speech
Fei Tao, Carlos Busso

Multimodal exemplar-based voice conversion using lip features in noisy environments
Kenta Masaka, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki

Towards a practical silent speech recognition system
Yunbin Deng, James T. Heaton, Geoffrey S. Meltzner

Enhancing multimodal silent speech interfaces with feature selection
João Freitas, Artur Ferreira, Mário Figueiredo, António Teixeira, Miguel Sales Dias

Opti-speech: a real-time, 3d visual feedback system for speech training
William Katz, Thomas F. Campbell, Jun Wang, Eric Farrar, J. Coleman Eubanks, Arvind Balasubramanian, Balakrishnan Prabhakaran, Rob Rennaker

Across-speaker articulatory normalization for speaker-independent silent speech recognition
Jun Wang, Ashok Samal, Jordan R. Green

Conversion from facial myoelectric signals to speech: a unit selection approach
Marlene Zahner, Matthias Janke, Michael Wand, Tanja Schultz

Towards real-life application of EMG-based speech recognition by using unsupervised adaptation
Michael Wand, Tanja Schultz

Simple gesture-based error correction interface for smartphone speech recognition
Yuan Liang, Koji Iwano, Koichi Shinoda

Cross-Lingual and Adaptive Language Modeling

Development of bilingual ASR system for MediaParl corpus
Petr Motlicek, David Imseng, Milos Cernak, Namhoon Kim

Investigation of cross-lingual bottleneck features in hybrid ASR systems
Jie Li, Rong Zheng, Bo Xu

Language identification of individual words with joint sequence models
Oluwapelumi Giwa, Marelie H. Davel

Audio-to-text alignment for speech recognition with very limited resources
Xavier Anguera, Jordi Luque, Ciro Gracia

A minimal-resource transliteration framework for vietnamese
Hoang Gia Ngo, Nancy F. Chen, Sunil Sivadas, Bin Ma, Haizhou Li

Combining recurrent neural networks and factored language models during decoding of code-Switching speech
Heike Adel, Dominic Telaar, Ngoc Thang Vu, Katrin Kirchhoff, Tanja Schultz

Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages
Zoltán Tüske, Pavel Golik, David Nolden, Ralf Schlüter, Hermann Ney

Mixture of latent words language models for domain adaptation
Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi

Improving spoken document retrieval by unsupervised language model adaptation using utterance-based web search
Robert Herms, Marc Ritter, Thomas Wilhelm-Stein, Maximilian Eibl

The nested indian buffet process for flexible topic modeling
Jen-Tzung Chien, Ying-Lan Chang

Automated closed captioning for Russian live broadcasting
K. Levin, I. Ponomareva, A. Bulusheva, G. Chernykh, I. Medennikov, N. Merkin, A. Prudnikov, Natalia Tomashenko

Show and Tell Session 1, 1

3d tongue motion visualization based on ultrasound image sequences
Kele Xu, Yin Yang, A. Jaumard-Hakoun, Martine Adda-Decker, A. Amelot, S. K. Al Kork, L. Crevier-Buchman, P. Chawah, G. Dreyfus, T. Fux, C. Pillot-Loiseau, P. Roussel, M. Stone, B. Denby

Listen with your skin: aerotak speech perception enhancement system
Donald Derrick, Tom De Rybel, Greg A. O'Beirne, Jennifer Hay

Speech assistant system
László Czap

Spoken dialogue system for restaurant recommendation and reservation
Rafael E. Banchs, Seokhwan Kim

Interlingual map task corpus collection
Hayakawa Akira, Nick Campbell, Saturnino Luz

A client mobile application for Chinese-Spanish statistical machine translation
Jordi Centelles, Marta R. Costa-jussà, Rafael E. Banchs

LuciawebGL: a new WebGL-Based talking head
Alberto Benin, Piero Cosi, Giuseppe Riccardo Leone, Giulio Paci

Crowdee: mobile crowdsourcing micro-task platform for celebrating the diversity of languages
Babak Naderi, Tim Polzehl, André Beyer, Tibor Pilz, Sebastian Möller

On the use of the `pure data' programming language for teaching and public outreach in speech processing
Roger K. Moore

Syncwords: a platform for semi-automated closed captioning and subtitles
Aleksandr Dubinsky

Robert A. J. Clark

An educational platform to capture, visualize and analyze rare singing
P. Chawah, S. K. Al Kork, T. Fux, Martine Adda-Decker, A. Amelot, N. Audibert, B. Denby, G. Dreyfus, A. Jaumard-Hakoun, C. Pillot-Loiseau, P. Roussel, M. Stone, Kele Xu, L. Crevier-Buchman

Single-channel speech enhancement based on non-negative matrix factorization and online noise adaptation
Kwang Myung Jeon, Chan Jun Chun, Woo Kyeong Seong, Hong Kook Kim, Myung Kyu Choi

Intelligibility of high-pitched vowel sounds in the singing and speaking of a female Cantonese opera singer
Dieter Maurer, Peggy Mok, Daniel Friedrichs, Volker Dellwo

Iterative refinement of amplitude and phase in single-channel speech enhancement
Pejman Mowlaee, Mario Kaoru Watanabe, Rahim Saeidi

elite-HTS: a NLP tool for French HMM-based speech synthesis
Sophie Roekhaut, Sandrine Brognaux, Richard Beaufort, Thierry Dutoit

SARA — singapore's automated responsive assistant for the touristic domain
Andreea I. Niculescu, Rafael E. Banchs, Ridong Jiang, Seokhwan Kim, Kheng Hui Yeo, Arthur Niswar

The speech recognition virtual kitchen: launch party
Andrew Plummer, Eric Riebling, Anuj Kumar, Florian Metze, Eric Fosler-Lussier, Rebecca Bates

System for automated speech and language analysis (SALSA)
Kyle Marek-Spartz, Benjamin Knoll, Robert Bill, Thomas Christie, Serguei Pakhomov

Pronunciation practice support system for children who have difficulty correctly pronouncing words
Ikuyo Masuda-Katsuse

Automated production of true-cased punctuated subtitles for weather and news broadcasts
Joris Driesen, Alexandra Birch, Simon Grimsey, Saeid Safarfashandi, Juliet Gauthier, Matt Simpson, Steve Renals

I2r speech2singing perfects everyone's singing
Minghui Dong, S. W. Lee, Haizhou Li, Paul Chan, Xuejian Peng, Jochen Walter Ehnes, Dongyan Huang

Spoken Term Detection and Document Retrieval

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection
Peng Yang, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li

Recent improvements in SRI's keyword detection system for noisy audio
Julien van Hout, Vikramjit Mitra, Yun Lei, Dimitra Vergyri, Martin Graciarena, Arindam Mandal, Horacio Franco

Utilizing state-level distance vector representation for improved spoken term detection by text and spoken queries
Mitsuaki Makino, Naoki Yamamoto, Atsuhiko Kai

Unsupervised spoken word retrieval using Gaussian-bernoulli restricted boltzmann machines
Raghavendra Reddy Pappagari, Shekhar Nayak, K. Sri Rama Murty

Unsupervised query-by-example spoken term detection using bag of acoustic words and non-segmental dynamic time warping
Basil George, Abhijeet Saxena, Gautam Mantena, Kishore Prahallad, B. Yegnanarayana

An empirical study of multilingual and low-resource spoken term detection using deep neural networks
Jie Li, Xiaorui Wang, Bo Xu

Diagnostic techniques for spoken keyword discovery
Peter Schulam, Murat Akbacak

Robust retrieval models for false positive errors in spoken documents
Sho Kawasaki, Tomoyosi Akiba

Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features
Yuan-ming Liou, Yi-sheng Fu, Hung-yi Lee, Lin-shan Lee

Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion
Guillaume Gravier, Nathan Souviraà-Labastie, Sébastien Campion, Frédéric Bimbot

Semantically based search in a social speech task
Fernando García, Emilio Sanchis, Ferran Pla

Prosody and Paralinguistic Information

Study of changes in glottal vibration characteristics during laughter
Vinay Kumar Mittal, B. Yegnanarayana

On predicting the unpleasantness level of a sound event
Stavros Ntalampiras, Ilyas Potamitis

Predicting when to laugh with structured classification
Bilal Piot, Olivier Pietquin, Matthieu Geist

Conversational structures affecting auditory likeability
Benjamin Weiss, Katrin Schoenenberg

Towards the adaptation of prosodic models for expressive text-to-speech synthesis
Mathieu Avanzi, George Christodoulides, Damien Lolive, Elisabeth Delais-Roussarie, Nelly Barbot

Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus
Sho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura

Learning L2 prosody is more difficult than you realize — F0 characteristics and chunking size of L1 English, TW L2 English and TW L1 Mandarin
Chiu-yu Tseng, Chao-yu Su

Investigating prosodic relations between initiating and responding laughs
Khiet P. Truong, Jürgen Trouvain

Application of image processing methods to filled pauses detection from spontaneous speech
Dmytro Prylipko, Olga Egorow, Ingo Siegert, Andreas Wendemuth

Perception of sentence stress in English infant directed speech
Sofoklis Kakouros, Okko Räsänen

Automatic recognition of attitudes in video blogs — prosodic and visual feature analysis
Noor Alhusna Madzlan, JingGuang Han, Francesca Bonin, Nick Campbell

“was that your mother on the phone?”: classifying interpersonal relationships between dialog participants with lexical and acoustic properties
Denys Katerenchuk, David Guy Brizan, Andrew Rosenberg

Deep Neural Networks for Speech Generation and Synthesis (Special

DNN-based stochastic postfilter for HMM-based speech synthesis
Ling-Hui Chen, Tuomo Raitio, Cassia Valentini-Botinhao, Junichi Yamagishi, Zhen-Hua Ling

Statistical parametric speech synthesis using weighted multi-distribution deep belief network
Shiyin Kang, Helen Meng

TTS synthesis with bidirectional LSTM based recurrent neural networks
Yuchen Fan, Yao Qian, Feng-Long Xie, Frank K. Soong

Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
Tuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio, Paavo Alku

An introduction to computational networks and the computational network toolkit (invited talk)
Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoff Zweig, Chris Rossbach, Jon Currey

Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks
Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory

Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree
Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai

High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion
Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki

Sequence error (SE) minimization training of neural network for voice conversion
Feng-Long Xie, Yao Qian, Yuchen Fan, Frank K. Soong, Haifeng Li

Robust articulatory speech synthesis using deep neural networks for BCI applications
Florent Bocquelet, Thomas Hueber, Laurent Girin, Pierre Badin, Blaise Yvert

Speech and Language Processing — General Topics

Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systems
Haihua Xu, Hang Su, Eng Siong Chng, Haizhou Li

A big data approach to acoustic model training corpus selection
Olga Kapralova, John Alex, Eugene Weinstein, Pedro J. Moreno, Olivier Siohan

Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera
Patrick Cardinal, Ahmed Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, James R. Glass, Stephan Vogel

rwthlm — the RWTH aachen university neural network language modeling toolkit
Martin Sundermeyer, Ralf Schlüter, Hermann Ney

Language modeling with sum-product networks
Wei-Chen Cheng, Stanley Kok, Hoai Vu Pham, Hai Leong Chieu, Kian Ming A. Chai

Improving deep neural network acoustic modeling for audio corpus indexing under the IARPA babel program
Xiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran, Andrew Rosenberg, Mohammad Sadegh Rasooli, Owen Rambow, Nizar Habash, Vaibhava Goel

Cross-language transfer of semantic annotation via targeted crowdsourcing
Shammur Absar Chowdhury, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Ioannis Klasinas

Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding
Dilek Hakkani-Tür, Asli Celikyilmaz, Larry Heck, Gokhan Tur, Geoff Zweig

Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch
Philip N. Garner, David Imseng, Thomas Meyer

Building resources for Algerian Arabic dialects
S. Harrat, K. Meftouh, M. Abbas, K. Smaili

Adaptation 1, 2

Adaptation of deep neural network acoustic models using factorised i-vectors
Penny Karanasou, Yongqiang Wang, Mark J. F. Gales, Philip C. Woodland

Regularized feature-space discriminative adaptation for robust ASR
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura, Steven J. Rennie, Vaibhava Goel

Towards speaker adaptive training of deep neural network acoustic models
Yajie Miao, Hao Zhang, Florian Metze

Component structuring and trajectory modeling for speech recognition
Arseniy Gorin, Denis Jouvet

Speaker dependent bottleneck layer training for speaker adaptation in automatic speech recognition
Rama Doddipatla, Madina Hasan, Thomas Hain

Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptation
Zhao You, Bo Xu

Speaker age estimation for elderly speech recognition in European Portuguese
Thomas Pellegrini, Vahid Hedayati, Isabel Trancoso, Annika Hämäläinen, Miguel Sales Dias

Unsupervised model selection for recognition of regional accented speech
Maryam Najafian, Andrea DeMarco, Stephen Cox, Martin Russell

Speaker adaptation based on sparse and low-rank eigenphone matrix estimation
Wen-Lin Zhang, Dan Qu, Wei-Qiang Zhang, Bi-Cheng Li

Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation
Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong

A low complexity model adaptation approach involving sparse coding over multiple dictionaries
S. Shahnawazuddin, Rohit Sinha

Effect of frequency weighting on MLP-based speaker canonicalization
Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta

Feature space maximum a posteriori linear regression for adaptation of deep neural networks
Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Chao Weng, Chin-Hui Lee

Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing
Natalia Tomashenko, Yuri Khokhlov

BUT 2014 Babel system: analysis of adaptation in NN based systems
Martin Karafiát, František Grézl, Karel Veselý, Mirko Hannemann, Igor Szőke, Jan Černocký

Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers?
Mickael Rouvier, Benoit Favre

Speech Representation, Detection and Classification

Phone classification by a hierarchy of invariant representation layers
Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio

A semi-Markov model for speech segmentation with an utterance-break prior
Mark Sinclair, Peter Bell, Alexandra Birch, Fergus McInnes

Speech detection in transient noises
G. Aneeja, B. Yegnanarayana

Evaluation of dictionary for sparse coding in speech processing
Yongjun He, Guanglu Sun, Guibin Zheng, Jiqing Han

Joint filtering and factorization for recovering latent structure from noisy speech data
Colin Vaz, Vikram Ramanarayanan, Shrikanth S. Narayanan

A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis
A. Gallardo-Antolín, J. M. Montero, Simon King

Read and spontaneous speech classification based on variance of GMM supervectors
Taichi Asami, Ryo Masumura, Hirokazu Masataki, Sumitaka Sakauchi

Co-channel speech detection via spectral analysis of frequency modulated sub-bands
Navid Shokouhi, Seyed Omid Sadjadi, John H. L. Hansen

Word-level invariant representations from acoustic waveforms
Stephen Voinea, Chiyuan Zhang, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio

On closed form calculation of line spectral frequencies (LSF)
Paul Dalsgaard, Ove Andersen

Robust features for content-based audio copy detection
Chahid Ouali, Pierre Dumouchel, Vishwa Gupta

Binaural deep neural network classification for reverberant speech segregation
Yi Jiang, DeLiang Wang, RunSheng Liu

Spoken Term Detection for Low-Resource Languages I, II

Query-by-example spoken term detection on multilingual unconstrained speech
Xavier Anguera, Luis Javier Rodriguez-Fuentes, Igor Szőke, Andi Buzo, Florian Metze, Mikel Penagarikano

A comparison of multiple methods for rescoring keyword search lists for low resource languages
Victor Soto, Lidia Mangu, Andrew Rosenberg, Julia Hirschberg

Subword and phonetic search for detecting out-of-vocabulary keywords
Damianos Karakos, Richard Schwartz

An in-depth comparison of keyword specific thresholding and sum-to-one score normalization
Yun Wang, Florian Metze

Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages
Hung-yi Lee, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass

Developing STT and KWS systems using limited language resources
Viet-Bac Le, Lori Lamel, Abdel Messaoudi, William Hartmann, Jean-Luc Gauvain, Cécile Woehrling, Julien Despres, Anindya Roy

Comparing decoding strategies for subword-based keyword spotting in low-resourced languages
William Hartmann, Viet-Bac Le, Abdel Messaoudi, Lori Lamel, Jean-Luc Gauvain

Strategies for rescoring keyword search results using word-burst and acoustic features
Min Ma, Justin Richards, Victor Soto, Julia Hirschberg, Andrew Rosenberg

Word-based probabilistic phonetic retrieval for low-resource spoken term detection
Di Xu, Florian Metze

A keyword-boosted sMBR criterion to enhance keyword search performance in deep neural network based acoustic modeling
I-Fan Chen, Nancy F. Chen, Chin-Hui Lee

Combination of FST and CN search in spoken term detection
Justin Chiu, Yun Wang, Jan Trmal, Daniel Povey, Guoguo Chen, Alexander I. Rudnicky

Low-resource open vocabulary keyword search using point process models
Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal, Sanjeev Khudanpur

Speech Enhancement (Single- and Multi-Channel) 1, 2

A new auxiliary-vector algorithm with conjugate orthogonality for speech enhancement
Shengkui Zhao, Douglas L. Jones

Acoustic characteristics of critical message utterances in noise applied to speech intelligibility enhancement
Neehar Jathar, Preeti Rao

Dynamic noise aware training for speech enhancement based on deep neural networks
Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee

Microphone array post-filtering using supervised machine learning for speech enhancement
Pasi Pertilä, Joonas Nikunen

Novel speech duration modifier for packet based communication system
Senthil Kumar Mani, Jitendra Kumar Dhiman, K. Sri Rama Murty

Experiments on deep learning for speech denoising
Ding Liu, Paris Smaragdis, Minje Kim

Single-channel dynamic exemplar-based speech enhancement
Nasser Mohammadiha, Simon Doclo

Using hidden Markov models for speech enhancement
Akihiro Kato, Ben Milner

Blind source extraction based on a direction-dependent a-priori SNR
Lukas Pfeifenberger, Franz Pernkopf

Least squares phase estimation of mixed signals
Carlos Eduardo Cancino Chacón, Pejman Mowlaee

Speech enhancement from additive noise and channel distortion — a corpus-based approach
Ji Ming, Danny Crookes

Multi-channel speech enhancement using sparse coding on local time-frequency structures
Zhiyuan Zhou, Zhaogui Ding, Weifeng Li, Zhiyong Wu, Longbiao Wang, Qingmin Liao

Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications
Seyedmahdad Mirsamadi, John H. L. Hansen

Speech enhancement by low-rank and convolutive dictionary spectrogram decomposition
Zhuo Chen, Brian McFee, Daniel P. W. Ellis

Multiple-order non-negative matrix factorization for speech enhancement
Xabier Jaureguiberry, Emmanuel Vincent, Gaël Richard

NMF-based speech enhancement incorporating deep neural network
Tae Gyoon Kang, Kisoo Kwon, Jong Won Shin, Nam Soo Kim

A data-driven approach to speech enhancement using Gaussian process
Sukanya Sonowal, Kisoo Kwon, Nam Soo Kim, Jong Won Shin



Multi-Lingual ASR

Prosody Processing

Speaker Recognition — Applications

Phonetics and Phonology 1, 2

Open Domain Situated Conversational Interaction (Special Session)

Speech Production: Models and Acoustics

Extraction of Para-Linguistic Information

Spoken Language Understanding

Spoken Dialogue Systems

DNN Architectures and Robust Recognition

Speaker Recognition — Evaluation and Forensics

Speech Production I, II

INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)

Hearing and Perception

Cross-Linguistic Studies

Speaker Diarization

Robust ASR 1, 2

Implementation of Language Model Algorithms

Speaker Recognition — Noise and Channel Robustness

Speech Synthesis I-III

Multi-Lingual Cross-Lingual and Low-Resource ASR

Speech Estimation and Sound Source Separation

Feature Extraction and Modeling for ASR 1, 2

Speech Analysis I, II

Speech Technologies and Applications

Source Separation and Computational Auditory Scene Analysis

Speech Technologies for Ambient Assisted Living (Special Session)


Speaker Recognition — General Topics

Speech Processing with Multi-Modalities

Normalization and Discriminative Training Methods

Paralinguistic and Extralinguistic Information

Text Processing for Speech Synthesis

Cross-language Perception and Production

Text-Dependent Speaker Verification With Short Utterances (Special

Speech and Audio Analysis

Cross-Lingual and Adaptive Language Modeling

Pronunciation Modeling and Learning

Show and Tell Session 1, 1

Statistical Parametric Speech Synthesis

Voice Activity Detection

Disordered Speech

Speech and Multimodal Resources

Phase Importance in Speech Processing Applications (Special Session)

Spoken Term Detection and Document Retrieval

Prosody and Paralinguistic Information

Features and Robustness in Speaker and Language Recognition

Topic Spotting and Summarization of Spoken Documents

DNN Learning

Perception of Emotion and Prosody

Deep Neural Networks for Speech Generation and Synthesis (Special

Speech Analysis and Perception

Intelligibility Enhancement and Predictive Measures

Speech and Language Processing — General Topics

Language, Dialect and Accent Recognition

Adaptation 1, 2

Speaker Localization

Speech Representation, Detection and Classification

Spoken Term Detection for Low-Resource Languages I, II

Voice Conversion

Speech and Audio Segmentation and Classification

Language Acquisition

Speech Perception

Language and Lexical Modeling

Speech Enhancement (Single- and Multi-Channel) 1, 2

Speech Coding and Transmission

Unsupervised or Corrective Lexical Modeling

Meta Data

Language Recognition