The French AeroSpace Lab (
ONERA) at Toulouse, in the south of France, is looking for a motivated and talented PhD candidate to work on both fundamental and applied RL research topics. The idea is developing RL methods for survival
situations, in which an automatic agent must learn an optimal sequence of
actions during its interaction with a partially unknown environment,
taking into account a risk of ruin.
Title: Reinforcement Learning under the Risk of Ruin
Profile and desired competences: The ideal candidate has a Research Master’s degree in Computer Sciences, Statistics, or Applied Mathematics, with a proven experience and interest in Artificial Intelligence, specifically concerning the domains of Machine Learning, Reinforcement Learning, and/or Stochastic Planning. The successful candidate also has good programming skills, and is comfortable coding scripts using Python language and Jupyter notebooks. Previous experience with standard machine learning libraries is a plus. The perfect candidate has previous knowledge about formal and mathematical models underlying Markovian Processes and Reinforcement Learning, and is able to read, understand, and write scientific papers. Curiosity, tenacity, autonomy, proactivity, and good communication skills are necessary qualities.
Subject: The thesis will focus on the class of problems
called Survival RL, which can be considered as the multi-state version
of the Survival Multiarmed Bandit. In the studied problem, the agent,
without any prior knowledge about the reward and state transition
functions, aims to learn an optimal policy constrained by a budget,
which evolves over time with the
received
rewards, and must remain positive throughout the process. The objective
is to find a good trade-off between exploration (i.e. acting to learn
new things), exploitation (i.e. acting in an optimal way according to
what is already known), and safety (i.e. managing the budget), seeking
to maximize rewards over time in an efficient way, while minimizing the
risk of ruin.
The objective of this thesis is (a) to consolidate and
generalize previous results concerning the multi-armed bandit model, and
(b) to extend that results to the reinforcement learning context,
initially by modifying classical, MDP-based, algorithms, then (c)
introducing the notion of survival into the deep reinforcement learning
framework. Other cues can also be considered, such as taking into
account partial observability, factorized representation of states, and
considering several budgeted survival variables, constituting a
multi-criteria and multi-constraint optimization problem. The theoretical and methodological findings must then be applied and validated on some aerospatial problem, the ONERA core business.
Keywords: Reinforcement Learning, Markovian Decision
Processes, Dynamic Programming, Planning, Stochastic Processes,
Multi-Armed Bandits, Sequential Decision under Uncertainty, Risk of
Ruin, Safe RL, Budgeted RL, Aerospatial Simulation and Planning.
Timelines and Application Procedure: The
thesis normally should start in October 2023 (it can be slightly
changed for convenience), for a period of 3 years. Candidates are invited to send a CV and
motivation letter by e-mail.
Applications will be evaluated constantly and the position will be
filled as soon as a suitable candidate is found. The work is to be done presentially at Toulouse. A good level of both French and English is needed.
More information:
Filipo Studzinski Perotto
Ingénieur-Chercheur en Apprentissage Automatique pour
l'Optimisation Discrète et Continue
ONERA - TOULOUSE
Département : Traitement de l’Information et
Systèmes (DTIS)
Unité : Systèmes Intelligents et Décision (SYD)
E-mail : filipo.perotto[at]onera.fr
Tél. : (+33) 5 62 25 26 00
ONERA - The
French Aerospace Lab - Centre de Toulouse
2, avenue Edouard Belin - BP 74025 - 31055 TOULOUSE CEDEX
www.onera.fr/en/emails-terms