SWE-Glu SF: Understanding the Disharmony between Dropout and Batch Normalization

40 views

Skip to first unread message

Cheikh Fiteni

unread,

Aug 15, 2024, 3:34:21 AM8/15/24

to SWE-Glu SF Papers Reading Group

Glu morning,

Our next meeting will be this Saturday, August 17th, 2:30 PM @ 619 Oak Street.

This week's paper is "Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift" by Xiao Ling, Shuo Chen, Xiaolin Hu, and Jian Yang (2018) https://arxiv.org/abs/1801.05134

Why this is cool:

Andrej Karpathy points to this paper in his Recipe for Training Neural Networks as an answer to why the two simplest regularization techniques—dropout and batch normalization—do not play nice with each.
The paper very clearly formalizes the relationship between dropout and variance

This paper definitely contains a bit more math than in previous weeks, so we highly encourage you to skim now and reach out if you have any concerns about requisite background necessary to fully appreciate it. That said, it is very easy to self teach (thank you Claude), and we can send you some great resources if you have any questions.

Regardless of your mathematical maturity, we highly recommend everyone try out the derivations for themselves, and I look forward to everyone sharing notes!

Thank you to the four of you who joined us at our second meeting.

Best,

Cheikh and Sasha

P.S. if you are somehow reading this email but not on our listserv join it here.