Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition

Takafumi Moriya, Jian Wang, Tomohiro Tanaka, Ryo Masumura, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono


We present a novel fully neural network (FNN) -based automatic speech recognition (ASR) system that addresses the out-of-vocabulary (OOV) problem. The most common approach to the OOV problem is leveraging character/sub-word level units as output symbols. Unfortunately, this approach is not suitable for Japanese and Mandarin Chinese since they have many more grapheme sets than English. Our solution is to develop FNN-based ASR that uses a pronunciation-based unit set with dictionaries, i.e., word-to-pronunciation rules. A previous study proposed, for Mandarin Chinese, a greedy cascading decoder (GCD) that uses two neural converters, acoustic-to-pronunciation (A2P) and pronunciation-to-word (P2W) conversion models. However, to generate optimal word sequences, the previous work considered just optimal pronunciation sequences. In this paper, we propose a joint maximization decoder (JMD) that considers the joint probability of pronunciation and word in beam-search decoding. Moreover, we introduce a neural network based joint source channel model for improving A2P conversion performance. Experiments on Japanese ASR tasks demonstrate that JMD achieves better performance than GCD. Furthermore, we show the effectiveness of using just language resources to retrain the P2W conversion model.


 DOI: 10.21437/Interspeech.2019-1558

Cite as: Moriya, T., Wang, J., Tanaka, T., Masumura, R., Shinohara, Y., Yamaguchi, Y., Aono, Y. (2019) Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition. Proc. Interspeech 2019, 4410-4414, DOI: 10.21437/Interspeech.2019-1558.


@inproceedings{Moriya2019,
  author={Takafumi Moriya and Jian Wang and Tomohiro Tanaka and Ryo Masumura and Yusuke Shinohara and Yoshikazu Yamaguchi and Yushi Aono},
  title={{Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4410--4414},
  doi={10.21437/Interspeech.2019-1558},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1558}
}