Title:
Model-Value Self-Consistent Updates and Applications
Abstract:
Learned models of the environment provide reinforcement learning agents with flexible ways
of making predictions about the environment. Models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this talk, we investigate a way of augmenting model-based RL,
by additionally encouraging a learned model and value function to be jointly self-consistent. This work covers possible ways to use self-consistency updates both for policy evaluation and control (Farquhar et al 20), as well as a proxy for epistemic uncertainty
in exploration (Filos et al. 22).
Short Bio:
Zita Marinho is a Research Scientist at Deepmind, where she is currently working on reinforcement
learning. She holds a dual PhD/MSc in Robotics from the Robotics Institute, and from IST University of Lisbon as part of the CMU/Portugal program. She graduated from her MSc. degree in Physics Engineering from Instituto Superior Técnico, Universidade de Lisboa
in 2010. Her research interests lie in the intersection of machine learning algorithms and Natural Language Processing. She is particularly interested in studying how agents can interact and learn more effectively from those interactions. She studied during
her PhD spectral algorithms for sequence prediction and planning. She was jointly advised by Prof. André Martins at Unbabel/IST, Prof. Geoffrey Gordon at the Machine Learning Department/CMU and Prof. Siddhartha Srinivasa from University of Washington.