SWE-Glu SF: Scaling Laws; Chinchilla and more

13 views
Skip to first unread message

sasha.hydrie

unread,
Aug 30, 2024, 10:16:47 PM8/30/24
to SWE-Glu SF Papers Reading Group

Glu evening,
 

Apologies for the late email.


Our next meeting will be Saturday, August 31th, 2:30 PM @ 848 Divisadero Street. We have stopped growing 300% week-over-week but are still searching for a new meeting space, if you have any leads please email us.

We are trying a different structure: this week's theme is scaling laws, our base paper will be "Training Compute-Optimal Large Language Models" by Hoffmann et al (2022), https://arxiv.org/abs/2203.15556. Aim to understand the high-level insights. Beyond that, choose any paper in or adjacent-to scaling laws and we’ll run mini paper presentations. If you are somewhat unsure what to read, Epoch AI has several great articles on scaling laws with rich references. If you are quite unsure what to read, interesting areas include inference-training trade-offs, theory with manifolds, or transfer learning.

 
Why scaling laws are cool:
1. One simple equation explains a lot of ML training

 
2. They let us quantify the bitter lesson
3. Oops, should have used more data


Here are some questions we will likely discuss. Feel free to post preliminary thoughts.

  1. What claims do scaling laws allow us to make? Are any particularly robust?

  2. Where does the “E” term, irreducible loss, come from?

  3. Which areas of research seem the most promising?


Thank you to the 10ish of you who joined us at our third meeting.

Best,
Cheikh and Sasha

P.S. if you are somehow reading this email but not on our listserv join it here. If you are on our listserv, send it to your friends.






Reply all
Reply to author
Forward
0 new messages