SWE-Glu SF: Othello-GPT Drama!

14 views
Skip to first unread message

sasha.hydrie

unread,
Oct 17, 2024, 3:06:59 AM10/17/24
to SWE-Glu SF Papers Reading Group

Glu night,

Our meeting will be Saturday, October 19th, 2:30 PM @ 848 Divisadero Street. This week, we are looking at Emergent World Representations where researchers used mechanistic interpretability techniques to determine if Othello-GPT maintains an internal board state. The saga continued with Neel Nanda coming to an even stronger result.


Why Othello drama is cool:

  1. Sonnet 3 + mech interp = Sonnet 3.5

  2. Neel spent only a weekend on the project

  3. Hot damn, check out this graph!


Best,
Cheikh and Sasha

P.S. if you are somehow reading this email but not on our listserv join it here. If you are on our listserv, send it to your friends.




Reply all
Reply to author
Forward
0 new messages