GS AI Seminar December 2024 – Compact Proofs of Model Performance via Mechanistic Interpretability (Louis Jaburi)

12 views

Skip to first unread message

Orpheus Lummis

unread,

Dec 8, 2024, 10:34:51 AM12/8/24

to guarantee...@googlegroups.com

You are invited to the December 2024 edition of the Guaranteed Safe AI Seminars:

Compact Proofs of Model Performance via Mechanistic Interpretability

by Louis Jaburi – Independent researcher

Thu December 12, 18:00-19:00 UTC
Join: https://lu.ma/g24bvacw

Description: Generating proofs about neural network behavior is a fundamental challenge as their internal structure is highly complicated. With recent progress in mechanistic interpretability, we have better tools to understand neural networks. In this talk, I will present a novel approach that leverages interpretations to construct rigorous (&compact!) proofs about model behavior, based on recent work [1][2]. I will explain how understanding a model's internal mechanisms can enable stronger mathematical guarantees about its behavior and discuss how these approaches connect to the broader guaranteed safe AI framework. Drawing from practical experience, I will share key challenges encountered and outline directions for scaling formal verification to increasingly complex neural networks.

[1] Compact Proofs of Model Performance via Mechanistic Interpretability (https://arxiv.org/abs/2406.11779)

[2] Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations (https://arxiv.org/abs/2410.07476)

Reply all

Reply to author

Forward

0 new messages