SWE-Glu SF: Crosscoders!

10 views
Skip to first unread message

sasha.hydrie

unread,
Oct 31, 2024, 2:10:32 AM10/31/24
to SWE-Glu SF Papers Reading Group
Glu night,

Our meeting will be Saturday, November 2nd, 2:30 PM @ 848 Divisadero Street. This week SWE-Glu is taking on the state of the art in mechanistic interpretability, Crosscoders, a variant of sparse autoencoders that allows us to understand features across layers and even models! This preliminary work builds on many of Anthropic’s prior results so if you are new to mech interp, we recommend choosing a single section and really honing in.


Why crosscoders are cool:

  1. Explaining one model is cool but explaining many models is cooler

  2. My employer is doing research on this and got scooped :(

  3. Learn where the role playing feature went


Best,
Cheikh and Sasha

P.S. if you are somehow reading this email but not on our listserv join it here. If you are on our listserv, send it to your friends.



Reply all
Reply to author
Forward
0 new messages