Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

GS AI Seminar December 2024 – Compact Proofs of Model Performance via Mechanistic Interpretability (Louis Jaburi)

11 views
Skip to first unread message

Orpheus Lummis

unread,
Dec 8, 2024, 10:34:51 AM12/8/24
to guarantee...@googlegroups.com
You are invited to the December 2024 edition of the Guaranteed Safe AI Seminars:

Compact Proofs of Model Performance via Mechanistic Interpretability
by ​Louis Jaburi – Independent researcher

Thu December 12, 18:00-19:00 UTC
Join: https://lu.ma/g24bvacw

Description: ​Generating proofs about neural network behavior is a fundamental challenge as their internal structure is highly complicated. With recent progress in mechanistic interpretability, we have better tools to understand neural networks. In this talk, I will present a novel approach that leverages interpretations to construct rigorous (&compact!) proofs about model behavior, based on recent work [1][2]. I will explain how understanding a model's internal mechanisms can enable stronger mathematical guarantees about its behavior and discuss how these approaches connect to the broader guaranteed safe AI framework. Drawing from practical experience, I will share key challenges encountered and outline directions for scaling formal verification to increasingly complex neural networks.

[1] Compact Proofs of Model Performance via Mechanistic Interpretability (https://arxiv.org/abs/2406.11779)
[2] Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations (https://arxiv.org/abs/2410.07476)



Reply all
Reply to author
Forward
0 new messages