OxCSML seminar this Friday 21 Apr 2023

27 views
Skip to first unread message

Hai Dang Dau

unread,
Apr 18, 2023, 6:33:43 PM4/18/23
to oxcsml...@googlegroups.com
Dear all,

This Friday we welcome three students from Sanjeev Arora’s lab at Princeton University to present their work in our OxCSML seminar. Please find all the details below.

Kind regards,
Hai-Dang

===============

Date and time: Friday 21 Apr 2023, 12:30 - 14:00 BST
Location: Large Lecture Theatre, Department of Statistics, University of Oxford
Zoom link: https://www.eventbrite.fr/e/oxcsml-seminar-friday-21-apr-2023-tickets-620226753917

Why (and When) does Local SGD Generalize Better than SGD?
Kaifeng Lyu, Princeton University
(ICLR'23; joint work with Xinran Gu, Longbo Huang, Sanjeev Arora)
Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically. It has been recently observed that Local SGD can not only achieve the design goal of reducing the communication overhead but also lead to higher test accuracy than the corresponding SGD baseline (Lin et al., 2020b). In this talk, I will present our recent paper that uses SDE-based analyses to understand why (and when) Local SGD generalizes better. We hope our work can inspire theory-driven designs of distributed training methods that can not only reduce wall-clock time but also help generalization.

Task-Specific Skill Localization in Fine-tuned Language Models
Abhishek Panigrahi, Princeton University
(Joint work with Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora)
Fine-tuning allows a pre-trained model to quickly pick up task-specific skills, but there has been limited study of where these newly-learnt skills reside. We introduce the term skill localization and propose a solution. Given the downstream task and a model fine-tuned on that task, we propose a simple optimization to identify a very small subset of parameters (∼0.01% of model parameters) responsible for (>95%) of the model’s performance. Skill localization improves calibration of in-distribution predictions (40-90% error reduction) and out-of-distribution (OOD) performance. In multi-task trained models, we observe a stronger notion of skill localization, where the sparse regions corresponding to different tasks are almost disjoint, and their overlap (when it happens) is a proxy for task similarity.

Modeling SGD with Stochastic Differential Equations: Theory and Applications
Sadhika Malladi,Princeton University
Mathematical analysis of finite-LR SGD often proceeds by approximating using Ito Stochastic Differential Equations (SDEs). Mathematical justification for this approximation (e.g., (Li et al., 2019)) only applies to SGD with tiny LR. Experimental verification of the approximation is felt to be computationally infeasible. We give: (a) An efficient simulation algorithm SVAG that provably converges to the Ito SDE. We show via and theory and experiments that SVAG is a good simulation. (b) A new testable condition that predicts when the linear scaling rule (Goyal et al., 2017) applies. This popular rule of thumb says that one should scale batch size and LR proportionally. (c) A novel SDE and the corresponding SVAG simulation to approximate Adam, which result in a square root scaling rule. Experiments verify this theory as well.
(Based upon:
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs). Zhiyuan Li, Sadhika Malladi, Sanjeev Arora. NeurIPS ’21.
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms. Sadhika Malladi*, Kaifeng Lyu*, Abhishek Panigrahi, Sanjeev Arora. NeurIPS ’22.)

Yee Whye Teh

unread,
Apr 19, 2023, 2:22:31 AM4/19/23
to Hai Dang Dau, oxcsml...@googlegroups.com
Please let me know if you’d like to meet with them. Thanks. Note the unusual time of talks.

--
Oxford Computational Statistics and Machine Learning
Website: http://csml.stats.ox.ac.uk/
GitHub: https://github.com/oxmlcs/
Events: https://github.com/oxmlcs/ML_bazaar/wiki/Machine-Learning-Bazaar-Schedule
---
You received this message because you are subscribed to the Google Groups "OxCSML Public Announcements" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oxcsml-publi...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/oxcsml-public/CAPbU0375Jc_ox%2Bwok11wD%3Do3MDwTe-fJ7wm3CiLE9CTNZRek6g%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages