[PhD] PhD position in Budgeted and Safe Reinforcement Learning at ONERA (Toulouse, France)

197 views
Skip to first unread message

Filipo Perotto

unread,
Feb 20, 2023, 8:57:18 PM2/20/23
to icaps-co...@googlegroups.com, ml-...@googlegroups.com, planni...@googlegroups.com, rl-...@googlegroups.com, bul...@irit.fr, jf...@loria.fr, pour...@risc.cnrs.fr

The French AeroSpace Lab (ONERA) at Toulouse, in the south of France, is looking for a motivated and talented PhD candidate to work on both fundamental and applied RL research topics. The idea is developing RL methods for survival situations, in which an automatic agent must learn an optimal sequence of actions during its interaction with a partially unknown environment, taking into account a risk of ruin.

Title: Reinforcement Learning under the Risk of Ruin

Profile and desired competences: The ideal candidate has a Research Master’s degree in Computer Sciences, Statistics, or Applied Mathematics, with a proven experience and interest in Artificial Intelligence, specifically concerning the domains of Machine Learning, Reinforcement Learning, and/or Stochastic Planning. The successful candidate also has good programming skills, and is comfortable coding scripts using Python language and Jupyter notebooks. Previous experience with standard machine learning libraries is a plus. The perfect candidate has previous knowledge about formal and mathematical models underlying Markovian Processes and Reinforcement Learning, and is able to read, understand, and write scientific papers. Curiosity, tenacity, autonomy, proactivity, and good communication skills are necessary qualities.

Subject: The thesis will focus on the class of problems called Survival RL, which can be considered as the multi-state version of the Survival Multiarmed Bandit. In the studied problem, the agent, without any prior knowledge about the reward and state transition functions, aims to learn an optimal policy constrained by a budget, which evolves over time with the received rewards, and must remain positive throughout the process. The objective is to find a good trade-off between exploration (i.e. acting to learn new things), exploitation (i.e. acting in an optimal way according to what is already known), and safety (i.e. managing the budget), seeking to maximize rewards over time in an efficient way, while minimizing the risk of ruin.
The objective of this thesis is (a) to consolidate and generalize previous results concerning the multi-armed bandit model, and (b) to extend that results to the reinforcement learning context, initially by modifying classical, MDP-based, algorithms, then (c) introducing the notion of survival into the deep reinforcement learning framework. Other cues can also be considered, such as taking into account partial observability, factorized representation of states, and considering several budgeted survival variables, constituting a multi-criteria and multi-constraint optimization problem. The theoretical and methodological findings must then be applied and validated on some aerospatial problem, the ONERA core business.

Keywords: Reinforcement Learning, Markovian Decision Processes, Dynamic Programming, Planning, Stochastic Processes, Multi-Armed Bandits, Sequential Decision under Uncertainty, Risk of Ruin, Safe RL, Budgeted RL, Aerospatial Simulation and Planning.

Timelines and Application Procedure: The thesis normally should start in October 2023 (it can be slightly changed for convenience), for a period of 3 years. Candidates are invited to send a CV and motivation letter by e-mail. Applications will be evaluated constantly and the position will be filled as soon as a suitable candidate is found. The work is to be done presentially at Toulouse. A good level of both French and English is needed.

Contact: filipo.perotto[at]onera.fr

More information:

Filipo Studzinski Perotto
Ingénieur-Chercheur en Apprentissage Automatique pour l'Optimisation Discrète et Continue
ONERA - TOULOUSE
Département : Traitement de l’Information et Systèmes (DTIS)
Unité : Systèmes Intelligents et Décision (SYD)
E-mail : filipo.perotto[at]onera.fr
Tél. : (+33) 5 62 25 26 00 
image.png
ONERA - The French Aerospace Lab - Centre de Toulouse
2, avenue Edouard Belin - BP 74025 - 31055 TOULOUSE CEDEX
www.onera.fr/en/emails-terms
Reply all
Reply to author
Forward
0 new messages