GSAI Seminar October 2025 – ​Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power (Jobst Heitzig)

3 views
Skip to first unread message

Orpheus Lummis

unread,
Sep 17, 2025, 11:17:46 AM (5 days ago) Sep 17
to guaranteed-safe-ai
You are invited to the October 2025 edition of the Guaranteed Safe AI Seminars:

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Jobst Heitzig – Senior Mathematician AI Safety Designer

On Thursday, October 9, 1:00 PM - 2:00 PM EDT.
RSVP: https://luma.com/susn7zfs

Abstract:

Power is a key concept in AI safety: power-seeking as an instrumental goal, sudden or gradual disempowerment of humans, power balance in human-AI interaction and international AI governance. At the same time, power as the ability to pursue diverse goals is essential for wellbeing.

This talk explores the idea of promoting both safety and wellbeing by forcing AI agents explicitly to empower humans and to manage the power balance between humans and AI agents in a desirable way. Using a principled, partially axiomatic approach, we design a parametrizable and decomposable objective function that represents an inequality- and risk-averse long-term aggregate of human power. It takes into account humans’ bounded rationality and social norms, and, crucially, considers a wide variety of possible human goals. By design, an agent that would fully maximize this metric would be "guaranteed" (relative to the used world model) to not disempower (in the sense of the used definition of "power") humanity. Still, we propose to only softly maximize the metric to account for model error and aspects of power not captured by our metric.

We derive algorithms for computing that metric by backward induction or approximating it via a form of multi-agent reinforcement learning from a given world model. We exemplify the consequences of (softly) maximizing this metric in a variety of paradigmatic situations and describe what instrumental sub-goals it will likely imply. Our cautious assessment is that softly maximizing suitable aggregate metrics of human power might constitute a beneficial objective for agentic AI systems that is safer than direct utility-based objectives.

Paper to read: https://arxiv.org/abs/2508.00159
Reply all
Reply to author
Forward
0 new messages