Simulated citizens' assemblies for AI alignment

Paul Melman

unread,

Feb 11, 2025, 5:14:52 PMFeb 11

to election by jury

Saw this thread today about a fascinating paper that found innate misalignments in the values of LLMs and successfully used simulated citizens' assemblies to finetune them toward alignment to human values.

https://x.com/DanHendrycks/status/1889344074098057439

Link to the paper itself:
https://drive.google.com/file/d/1QAzSj24Fp0O6GfkskmnULmI1Hmx7k_EJ/view

Paul Melman

unread,

Feb 12, 2025, 3:33:09 PMFeb 12

to election by jury

Apparently this data might be unreliable https://x.com/colin_fraser/status/1889416126469226941

Clay S

unread,

Feb 12, 2025, 5:03:03 PMFeb 12

to election by jury

either way, the core idea is right. this occurred to me very early on: that once you have to encode ethical principles in code—and you obviously want to make them logically consistent—then you inexorably arrive at utilitarianism, and you have to specifically state your utilities.

Reply all

Reply to author

Forward