Simulated citizens' assemblies for AI alignment

9 views
Skip to first unread message

Paul Melman

unread,
Feb 11, 2025, 5:14:52 PMFeb 11
to election by jury
Saw this thread today about a fascinating paper that found innate misalignments in the values of LLMs and successfully used simulated citizens' assemblies to finetune them toward alignment to human values. 

https://x.com/DanHendrycks/status/1889344074098057439

Link to the paper itself:
https://drive.google.com/file/d/1QAzSj24Fp0O6GfkskmnULmI1Hmx7k_EJ/view

Paul Melman

unread,
Feb 12, 2025, 3:33:09 PMFeb 12
to election by jury
Apparently this data might be unreliable https://x.com/colin_fraser/status/1889416126469226941

Clay S

unread,
Feb 12, 2025, 5:03:03 PMFeb 12
to election by jury
either way, the core idea is right. this occurred to me very early on: that once you have to encode ethical principles in code—and you obviously want to make them logically consistent—then you inexorably arrive at utilitarianism, and you have to specifically state your utilities.
Reply all
Reply to author
Forward
0 new messages