Beware what you wish for: Specification gaming and value alignment in AI and RL
AI techniques such as reinforcement learning can be very powerful methods to optimize certain given objectives, but unfortunately humans are notable bad at stating their objectives and constraints well enough, or overseeing the side effects of maximizing these objectives.
In reinforcement learning this problem is known as specification gaming – which actually is a misnomer as the AI is simply blindly trying to optimize the given objective. A more general term is value alignment – how can we make sure that the AI aligns its values with ours.
In this project we want to provide a compelling example of specification gaming, to make the public further aware of this key existential risk of AI. Optionally, we can look into finding solutions, for example by accepting that objectives specifications are initially flawed, but humans can adapt these on the way.
Requires students with experience in machine learning and ideally also deep learning and reinforcement learning, and with an interest in responsible AI