Post-doctoral and engineer positions at Loria (France) : Automatic speech recognition for non-native speakers in a noisy environment

94 views
Skip to first unread message

Emmanuel Vincent

unread,
May 15, 2023, 2:59:44 AM5/15/23
to illina
Automatic speech recognition for non-native speakers in a noisy environment

Post-doctoral and engineer positions
Starting date: July-September 2023
Duration: 24 months for a post-doc position and 12 months for an engineer position
Supervisors: Irina Illina, Associate Professor, Université de Lorraine, Multispeech Team, https://members.loria.fr/IIllina/
Emmanuel Vincent, Senion Research Scientist, Inria, Multispeech Team, http://members.loria.fr/evincent/

Context
When a person has their hands busy performing a task like driving a car or piloting an airplane, voice is a fast and efficient interaction modality. In recent years, end-to-end deep learning based automatic speech recognition (ASR), which optimizes the probability of the output character sequence given an input speech signal, has made great progress [Chan et al., 2016; Baevski et al., 2020; Gulati, et al., 2020]. In aeronautical communications, the English language is most often compulsory. Unfortunately, many pilots are not native English speakers and exhibit an accent which is influenced by the pronunciation mechanisms of their native language. Inside an aircraft cockpit, the non-native voice of the pilots and the surrounding noises are the most difficult challenges to overcome in order to achieve efficient ASR. Non-native speech incurs several challenges [Shi et al., 2021]: incorrect or approximate pronunciations, errors in gender and number agreement, use of non-existent words, missing articles, grammatically incorrect sentences, etc. The acoustic environment adds a disturbing component to the speech signal. Much of the success of speech recognition relies on the ability to take into account different accents and ambient noises into the models used by ASR.

Objectives
The recruited postdoc or engineer will develop methodologies and tools to achieve high-performance non-native ASR in the aeronautical context, more specifically in a (noisy) aircraft cockpit. He/she will build on an end-to-end ASR system using wav2vec 2.0 [Baevski et al., 2020], a state-of-the-art self-supervised representation of speech.

How to apply: Interested candidates are encouraged to contact Irina Illina (ill...@loria.fr) with the required documents (CV, transcripts, motivation letter, and recommendation letters).
Applications will be screened subject to the requirements of the French Directorate General of Armament (DGA).

Requirements & skills:
- Ph.D. degree in speech/audio processing, computer vision, machine learning, or in a related field,
- ability to work independently as well as in a team,
- solid programming skills (Python, PyTorch), and deep learning knowledge,
- good level of written and spoken English.

References
[Baevski et al., 2020] A. Baevski, H. Zhou, A. Mohamed, and M. Auli. Wav2vec 2.0: A framework for self-supervised learning of speech representations, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.
[Chan et al., 2016] W. Chan, N. Jaitly, Q. Le and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960-4964, 2016.
[Chorowski et al., 2017] J. Chorowski, N. Jaitly. Towards better decoding and language model integration in sequence to sequence models. Interspeech, 2017.
[Houlsby et al., 2019] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly. Parameter-efficient transfer learning for NLP. International Conference on Machine Learning, PMLR, pp. 2790–2799, 2019.
[Gulati et al., 2020] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang. Conformer: Convolution-augmented transformer for speech recognition. Interspeech, 2020.
[Shi et al., 2021] X. Shi, F. Yu, Y. Lu, Y. Liang, Q. Feng, D. Wang, Y. Qian, and L. Xie. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6918–6922, 2021.

--
Emmanuel Vincent
Senior Research Scientist & Head of Science
Inria Nancy - Grand Est
+33 3 8359 3083 - http://members.loria.fr/evincent/
Reply all
Reply to author
Forward
0 new messages