Glu evening,
Our next meeting will be Saturday, August 24th, 2:30 PM @ 848 Divisadero Street. We have been growing 300% week-over-week and are now searching for a new meeting space, if you have any leads please email us.
This week's paper is "RoFormer: Enhanced Transformer with Rotary Position Embedding" by Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu (2021) arxiv.org/abs/2104.09864
Why this is cool:
1. This positional-encoding scheme is behind nearly-every frontier model, part of the mythical Noam architecture*
2. All of this takes not even 20 lines to implement (or a repo to experiment with)
3. Add on a random reddit thread and a forced acronym to get graphs like this
This week's paper is light on math (compared to last week, at least) but rich in intuition and history! We'll certainly take some time to work through why it works and how we got here. For the attention-challenged among you, try RoPE this video is a great primer on the paper.
We're trying something new this week. In the spirit of listserv, feel free to respond with your initial thoughts.
How does the choice of Θ impact the encoding? What properties are required?
Which parts of the paper are the most sketchy?
Were you impressed by the results? How did RoPE become so dominant? (What happened to ALIBI?)
Thank you to the 15ish (!) of you who joined us at our third meeting.
Best,
Cheikh and Sasha
P.S. if you are somehow reading this email but not on our listserv join it here. If you are on our listserv, send it to your friends.
*I still suspect that Cheikh made up this term himself