You are invited to the November 2024 edition of the Guaranteed Safe AI Seminars.
Bayesian oracles and safety bounds
by Yoshua Bengio, Scientific Director, Mila & Full Professor, U. Montreal
November 14, 18:00-19:00 UTC
Join:
https://lu.ma/4ylbvs75Description: Could there be safety advantages to the training of a Bayesian oracle that is trained to only do that job, i.e., estimate P(answer | question, data)? What are the scenarios in which such an AI could cause catastrophic harm? Can we even use such an oracle as the intelligence engine of an agent, e.g., by sampling actions that help to achieve goals? What can go wrong even if we assume that we have a perfect prediction of the Bayesian posterior, e.g., if the true explanatory theory is a minority voice in the Bayesian posterior regarding harm prediction? If such an oracle is estimated by a neural network with amortized inference, what could go wrong? Could the implicit optimization used to train the estimated posterior create loopholes with an optimistic bias regarding harm? Could we also use such a Bayesian oracle to obtain conservative risk estimates, i.e., bounds on the probability of harm, that can mitigate the imperfections in such an agent?