ECE NYU Tandon Seminar Series on Modern AI: Mark Schmidt

71 views
Skip to first unread message

Anna Choromanska

unread,
Feb 12, 2025, 9:47:57 AMFeb 12
to Anna Choromanska, Urs Muller, Alina Beygelzimer, Yoshua Bengio, Umar Syed, Apoorv Agarwal, Kurt Becker, Kaan Ozbay, c2s...@nyu.edu, in...@catt.poly.edu, IN...@nycmedialab.org, C...@nyu.edu, bakh...@nyu.edu, Kate Crawford, Steven Kuyan, kcg...@cs.nyu.edu, fergu...@cs.nyu.edu, lecungroup, Eve D Henderson, Prof. Sandeep Shukla, jmil...@g.harvard.edu, Sparsh Mittal, deb...@cse.iitkgp.ernet.in, Women in Machine Learning, deep-l...@googlegroups.com, Vikram Kapila, Iskender Sahin, Magued Iskander, Gene DiResta, Guido Gerig, sha...@cs.cmu.edu, Joanne Walsh, Samory K. Kpotufe, daniel....@yale.edu, koh...@seas.harvard.edu, rox...@cs.columbia.edu, Rosemary Addarich, machine-lear...@googlegroups.com, collo...@lists.cs.columbia.edu, fran...@orabona.com, Daniel Hsu, AI Now Info, Lisa Hellerstein, Jelena Kovacevic, katepalli....@nyu.edu, ecefac_nyu.edu <ecettt_nyu.edu>,, CSEFac <csefac-group_nyu.edu>,, ecefac_nyu.edu, Krzysztof Choromański, Narges Razavian, Razavian, Narges, Mary K Cowman, Berman, Russell, Jennifer Stein, Wenbo Gao, Semiha Ergan, Sergül Aydöre, Raquel C Thompson, Mohamed....@shell.com, valeri...@nyu.edu, Michele James, Rocio Araujo, Yann LeCun, Riccardo Lattanzi, Cesar Lema, Larry Jackel, Justin Hendrix, Public Affairs, Kathryn Angeles, Erica Matsumoto, Julia Kempe, Joan Bruna, Remi Moss, Brian Kingsbury, Ronny Luss, Murray Campbell, Lior Horesh, Parikshit Ram, Songtao Lu, dpenn...@gmail.com, John Langford, Christopher Musco, azelc...@gmail.com, jne...@nyu.edu, agpa...@gmail.com, Rose Ampuero, David Blei, Michael Richardson, Ingrid Redman, Mari Rich, Karl P Greenberg, Sheldon C Smith, Kathleen Hamilton, Elena Olivo, Meredith Whittaker, mar...@ainowinstitute.org, Zexing Xu, nt2...@nyu.edu, Jingtong Su, Yunzhen Feng, Sixin Zhang, webm...@cs.columbia.edu, Jing Wang, Κρίστι Τοπολλάι, Tolga Dimlioglu, Haoran Zhu, Shihong Fang, Apoorva Nandini Saridena, Yunfei Teng, Maryam Majzoubi, Erik Nauman, Rodriguez Mark, Sara A Solla, Sara A Solla, Mihir Upadhyay, Michael Picheny
Dear All,

The first speaker of the Spring 2025 NYU Tandon ECE Seminar Series on Modern AI is Mark Schmidt from the University of British Columbia. He will speak on the 18th of February at 11.00 am. The event is held in-person and also broadcasted via zoom:

In-person location: 6MTC MakerEvent Space

The details of the event are provided below. NYU Tandon is looking forward to seeing you all!!!




Department of Electrical & Computer Engineering
ECE Special Seminar Series Spring 2025 Modern Artificial Intelligence
Tuesday, February 18, 2025
Time: 11:00 am
In person: 6 MetroTech
Maker Eventspace

Zoom
Contact: ece-anno...@nyu.edu
Register
Mark Schmidt
University of British Columbia

 
Mark Schmidt is a professor in the Department of Computer Science at the University of British Columbia. His research focuses on developing faster algorithms for large-scale machine learning, and exploring applications of machine learning. He is a Canada Research Chair, Alfred P. Sloan Fellow, NSERC Arthur B. McDonald Fellow, CIFAR Canada AIChair with the Alberta Machine Intelligence Institute (Amii). Along with Nicolas Le Roux and Francis Bach, Mark was awarded the 2018 SIAM/MOS Lagrange Prize in Continuous Optimization.
Why does Adam work so well for LLMs? And can we find optimal per-variable step sizes?
 
The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, it is unclear why the gap between Adam and SGD is often big for large language models (LLMs) but small for computer vision benchmarks. Recent work proposed that Adam works better for LLMs due to heavy-tailed noise in the stochastic gradients. We show evidence that the noise is not a major factor in the performance gap between SGD and Adam. Instead, we show that a key factor is the class imbalance found in language tasks. In particular, the large number of low-frequency classes causes SGD to converge slowly but has a smaller effect on Adam and sign descent. We show that a gap between SGD and Adam can be induced by adding a large number of low-frequency classes to computer vision models or even to linear models. We further prove in a simple setting that gradient descent converges slowly while sign descent does not.

A key component of the Adam optimizer's success is using per-variable step sizes. However, neither Adam nor any other "adaptive" algorithm is known to perform within any known factor of the optimal fixed per-variable step sizes for the textbook problem of minimizing a smooth strongly-convex function. We propose the first method to update per-variable step sizes that provably performs within a known factor of the optimal step sizes. The method is based on a multi-dimensional backtracking procedure that adaptively uses hyper-gradients to generate cutting planes that reduce the search space for optimal step sizes. As black-box cutting-plane approaches like the ellipsoid method are computationally prohibitive, we develop practical linear-time variants for this setting.

This event is free and open to the public.

The Seminar Series in Modern Artificial Intelligence is held at NYU Tandon School of Engineering and is hosted by the Department of Electrical and Computer Engineering. Organized by Professor Anna Choromanska, the series aims to bring together faculty and students to discuss the most important research trends in the world of AI. The speakers include world-renowned experts whose research is making an immense impact on the development of new machine learning techniques and technologies and helping to build a better, smarter, more-connected world.
To make sure you keep getting these emails, please add Modern AI Seminar Series Mark Schmidt to your address book or whitelist us. If you are not a member of the NYU Tandon community and wish to be removed from all our mailing lists, unsubscribe. 

Can't see this email? View it as a webpage here.
Salesforce MC
This email was sent by: NYU Tandon School of Engineering
One MetroTech Center, 19th Floor, Brooklyn, NY, 11201-3818 US

Privacy Policy

Update Profile      Manage Subscriptions      Unsubscribe

Regards,




--
Anna Choromanska

Associate Professor

Alfred P. Sloan Fellow

Department of Electrical and Computer Engineering

NYU Tandon School of Engineering

New York University

Room 802

370 Jay Street

New York, NY 11201, USA

Office phone: 646.997.0269

ac5455 at nyu dot edu

achoroma at gmail dot com

https://engineering.nyu.edu/faculty/anna-choromanska


Anna Choromanska

unread,
Feb 12, 2025, 10:00:12 AMFeb 12
to Anna Choromanska, Urs Muller, Alina Beygelzimer, Yoshua Bengio, Umar Syed, Apoorv Agarwal, Kurt Becker, Kaan Ozbay, c2s...@nyu.edu, in...@catt.poly.edu, IN...@nycmedialab.org, C...@nyu.edu, bakh...@nyu.edu, Kate Crawford, Steven Kuyan, kcg...@cs.nyu.edu, fergu...@cs.nyu.edu, lecungroup, Eve D Henderson, Prof. Sandeep Shukla, jmil...@g.harvard.edu, Sparsh Mittal, deb...@cse.iitkgp.ernet.in, Women in Machine Learning, deep-l...@googlegroups.com, Vikram Kapila, Iskender Sahin, Magued Iskander, Gene DiResta, Guido Gerig, sha...@cs.cmu.edu, Joanne Walsh, Samory K. Kpotufe, daniel....@yale.edu, koh...@seas.harvard.edu, rox...@cs.columbia.edu, Rosemary Addarich, machine-lear...@googlegroups.com, collo...@lists.cs.columbia.edu, fran...@orabona.com, Daniel Hsu, AI Now Info, Lisa Hellerstein, Jelena Kovacevic, katepalli....@nyu.edu, ecefac_nyu.edu <ecettt_nyu.edu>,, CSEFac <csefac-group_nyu.edu>,, ecefac_nyu.edu, Krzysztof Choromański, Narges Razavian, Razavian, Narges, Mary K Cowman, Berman, Russell, Jennifer Stein, Wenbo Gao, Semiha Ergan, Sergül Aydöre, Raquel C Thompson, Mohamed....@shell.com, valeri...@nyu.edu, Michele James, Rocio Araujo, Yann LeCun, Riccardo Lattanzi, Cesar Lema, Larry Jackel, Justin Hendrix, Public Affairs, Kathryn Angeles, Erica Matsumoto, Julia Kempe, Joan Bruna, Remi Moss, Brian Kingsbury, Ronny Luss, Murray Campbell, Lior Horesh, Parikshit Ram, Songtao Lu, dpenn...@gmail.com, John Langford, Christopher Musco, azelc...@gmail.com, jne...@nyu.edu, agpa...@gmail.com, Rose Ampuero, David Blei, Michael Richardson, Ingrid Redman, Mari Rich, Karl P Greenberg, Sheldon C Smith, Kathleen Hamilton, Elena Olivo, Meredith Whittaker, mar...@ainowinstitute.org, Zexing Xu, nt2...@nyu.edu, Jingtong Su, Yunzhen Feng, Sixin Zhang, webm...@cs.columbia.edu, Jing Wang, Κρίστι Τοπολλάι, Tolga Dimlioglu, Haoran Zhu, Shihong Fang, Apoorva Nandini Saridena, Yunfei Teng, Maryam Majzoubi, Erik Nauman, Rodriguez Mark, Sara A Solla, Sara A Solla, Mihir Upadhyay, Michael Picheny
Reply all
Reply to author
Forward
0 new messages