Beware what you wish for: Specification gaming and value alignment in AI and RL

24 views

Skip to first unread message

Peter van der Putten

unread,

Feb 10, 2021, 5:04:36 PM2/10/21

to LIACS thesis projects

Dear students,

Please find a thesis project (MSc or BSc) in reinforcement learning and responsible AI below,

Peter

Beware what you wish for: Specification gaming and value alignment in AI and RL

AI techniques such as reinforcement learning can be very powerful methods to optimize certain given objectives, but unfortunately humans are notable bad at stating their objectives and constraints well enough, or overseeing the side effects of maximizing these objectives.

In reinforcement learning this problem is known as specification gaming – which actually is a misnomer as the AI is simply blindly trying to optimize the given objective. A more general term is value alignment – how can we make sure that the AI aligns its values with ours.

In this project we want to provide a compelling example of specification gaming, to make the public further aware of this key existential risk of AI. Optionally, we can look into finding solutions, for example by accepting that objectives specifications are initially flawed, but humans can adapt these on the way.

Requires students with experience in machine learning and ideally also deep learning and reinforcement learning, and with an interest in responsible AI

Peter van der Putten

Assistant professor, LIACS, Leiden University, The Netherlands

http://liacs.leidenuniv.nl/~puttenpwhvander/

Reply all

Reply to author

Forward

0 new messages