I have a follow up question about evaluation, are we considering:
(1): a low false positive setting like LiRA, where we are testing the ability of an adversarial to identify data in the forget set with low false positive rate
(2): or are we considering the average case, such as the overall accuracy (as used in the starter kit notebook) or ROC-AUC of an adversarial trying to identify data in the forget set
There is a likely a tradeoff between performance in the low FP setting and average performance when selecting unlearning algorithms, so it would be helpful if we can know which setting to focus on.
Thanks!
Andrew