Emergent Abilities of Large Language Models (pdf)

25 views
Skip to first unread message

Alan Timm

unread,
Sep 20, 2023, 3:51:36 PM9/20/23
to RSSC-List
https://openreview.net/pdf?id=yzkSU5zdwD

Abstract:
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.

And because this is hosted in openreview.net, you can "review" the peer reviews too:

Chris Albertson

unread,
Sep 20, 2023, 11:17:12 PM9/20/23
to Alan Timm, RSSC-List
My guess is that as the models get larger, they will asymptotically reach a limit.  We are still at the midpoint of the curve.  In other words, it will get harder and harder to create emergent behaviors

It will be interesting to see where this stops, with trillion parameters models, with a hundred trillion.  Or maybe we are already close to the limit?

We may never know the answer because before we get to a trillion, someone will invent the next new thing and interest will wane from these LLMs.


--
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rssc-list/f2f506c4-5c07-498a-8cda-960fc8d92d21n%40googlegroups.com.

Alan Timm

unread,
Sep 21, 2023, 11:58:25 PM9/21/23
to RSSC-List
That's where things get interesting.  People thought that we had reached a limit around gpt3 with 175B parameters.

Then in April of last year google released a paper about their Palm model with 540B parameters and the new capabilities that became available at that scale.

And all that was before RLHF (reinforcement learning through human feedback) and other types of fine tuning.

I'm constantly stunned by the capabilities of GPT4.  Someone leaked unofficially that has an MOE architecture (mixture of experts) with 8 "experts" of about 220B parameters each.

I'm just happy that I'm able to run local models that approach chatgpt's capabilities.  :-)


Reply all
Reply to author
Forward
0 new messages