SWE-Glu SF: Fantastic RL-based Training Processes and Where to Find Them

27 views
Skip to first unread message

sasha.hydrie

unread,
Sep 19, 2024, 1:51:18 AM9/19/24
to SWE-Glu SF Papers Reading Group

Glu night,

Our discussion will be Saturday, September 21st, 2:30 PM @ 848 Divisadero Street. Additionally, doors will be open at 12:30 PM for anyone interested in joining us to read.


OpenAI’s o1 exhibits some interesting behaviors (and some concerning ones). Let’s find out why!


This week’s theme is post-training with an emphasis on reinforcement learning techniques. Little is known (publicly) about what the labs are doing and the literature is all over the place. Skim Lilian Weng’s wonderful posts A (Long) Peek into Reinforcement Learning and Curriculum for Reinforcement Learning to be prepared for anything.


Once again, we encourage everyone to choose a paper and identify a few highlights to share with the larger group. Interesting areas include bootstrapping reasoning, progressive distillation, reward models and even more progressive distillation.


Why RL-based training is cool:

  1. No more pre-training tokens? No problem!

  2. RL has been pivotal in several areas (one of these is not like the others)

  3. they have played us for absolute fools; this is real reasoning done by real models:


Best,
Cheikh and Sasha

P.S. if you are somehow reading this email but not on our listserv join it here. If you are on our listserv, send it to your friends.
Reply all
Reply to author
Forward
0 new messages