Glu changes,
…..think of this as a <|pause|> and not an <|endoftext|> token….
SWEGlu will be going on ~3 week hiatus while Sasha and I touch some grass and pursue individual life goals. In the meantime you can catch up on all the interpretability research we’ve been keeping you away from. Highly encourage you all to keep learning and meeting people to learn from—distillation is still the best way for small models to gain major improvements in performance.
Until then, feel more than welcome to send us papers or discussions you think are worth talking about, and keep us in the event-loop on everything you’re working on. Keep loving ML, take care of yourselves, and thanks for the first half.
Best,
Cheikh and Sasha
P.S. if you are somehow reading this email but not on our listserv join it here. If you are on our listserv, send it to your friends.