Post-doctoral
and engineer positions
Starting date: July-September 2023
Duration: 24 months for a post-doc
position and 12 months for an engineer position
Context
When a person
has their hands busy performing a task like driving a
car or piloting an airplane, voice is a fast and
efficient interaction modality. In recent years,
end-to-end deep learning based automatic speech
recognition (ASR), which optimizes the probability of
the output character sequence given an input speech
signal, has made great progress [Chan et al., 2016;
Baevski et al., 2020; Gulati, et al., 2020]. In
aeronautical communications, the English language is
most often compulsory. Unfortunately, many pilots are
not native English speakers and exhibit an accent
which is influenced by the pronunciation mechanisms of
their native language. Inside an aircraft cockpit, the
non-native voice of the pilots and the surrounding
noises are the most difficult challenges to overcome
in order to achieve efficient ASR. Non-native speech
incurs several challenges [Shi et al., 2021]:
incorrect or approximate pronunciations, errors in
gender and number agreement, use of non-existent
words, missing articles, grammatically incorrect
sentences, etc. The acoustic environment adds a
disturbing component to the speech signal. Much of the
success of speech recognition relies on the ability to
take into account different accents and ambient noises
into the models used by ASR.
Objectives
The recruited
postdoc or engineer will develop methodologies and
tools to achieve high-performance non-native ASR in
the aeronautical context, more specifically in a
(noisy) aircraft cockpit. He/she will build on an
end-to-end ASR system using wav2vec 2.0 [Baevski et
al., 2020], a state-of-the-art self-supervised
representation of speech.
How
to apply: Interested candidates are
encouraged to contact Irina Illina (
ill...@loria.fr)
with the required documents (CV, transcripts,
motivation letter, and recommendation letters).
Applications
will be screened subject to the requirements of the
French Directorate General of Armament (DGA).
Requirements
& skills:
- Ph.D.
degree in speech/audio processing, computer vision,
machine learning, or in a related field,
- ability to
work independently as well as in a team,
- solid
programming skills (Python, PyTorch), and deep
learning knowledge,
- good level
of written and spoken English.
References
[Baevski et
al., 2020] A. Baevski, H. Zhou, A. Mohamed, and M.
Auli. Wav2vec 2.0: A framework for self-supervised
learning of speech representations, 34th Conference on
Neural Information Processing Systems (NeurIPS 2020),
2020.
[Chan et al.,
2016] W. Chan, N. Jaitly, Q. Le and O. Vinyals.
Listen, attend and spell: A neural network for large
vocabulary conversational speech recognition. IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP), 2016, pp. 4960-4964, 2016.
[Chorowski et
al., 2017] J. Chorowski, N. Jaitly. Towards better
decoding and language model integration in sequence to
sequence models. Interspeech, 2017.
[Houlsby et
al., 2019] N. Houlsby, A. Giurgiu, S. Jastrzebski, B.
Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan,
S. Gelly. Parameter-efficient transfer learning for
NLP. International Conference on Machine Learning,
PMLR, pp. 2790–2799, 2019.
[Gulati et
al., 2020] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar,
Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and
R. Pang. Conformer: Convolution-augmented transformer
for speech recognition. Interspeech, 2020.
[Shi et al.,
2021] X. Shi, F. Yu, Y. Lu, Y. Liang, Q. Feng, D.
Wang, Y. Qian, and L. Xie. The accented english speech
recognition challenge 2020: open datasets, tracks,
baselines, results and methods. IEEE International
Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 6918–6922, 2021.