4:00pm-5:00pm CET:
An Auditing Test to Detect Behavioral Shift in Language Models
by Matt J. Kusner, Ecole polytechnique de Montréal (Canada)
Abstract:
As language models (LMs) approach human-level performance, acomprehensive understanding of their behavior becomes crucial
to avoid potential harms. While extensive initial evaluations, including red teaming and diverse benchmarking, can establish a behavioral profile, subsequent fine-tuning or deployment modifications may alter these model behaviors in unintended ways. We study
the behavioral shift auditing problem, where the goal is to detect unintended changes in model behavior. We formalize this problem as a sequential hypothesis test.
We apply and extend a recent testing method to include a configurable tolerance
parameter that adjusts sensitivity to behavioral changes for different use cases. The
test is guaranteed to be consistent and has tight control over the Type I error rate. We
evaluate our approach using two case studies: monitoring model changes in (a) toxicity and
(b) translation performance. We find that the test is able to detect distribution changes in model behavior using hundreds of prompts. This talk is based on the ICLR 2025 paper:
https://openreview.net/pdf?id=h0jdAboh0o
5:00pm-6:00pm CET:
Automated Testing and Safety Analysis of Deep Learning Systems
by Lionel Briand, University of Ottawa (Canada)
Abstract:
Software engineering has long sought ways to improve software testing to ensure that critical software is reliable before
deployment. The rise of deep learning (DL) software has disrupted traditional testing and analysis practices, prompting the
development of specialized methods and techniques to address the unique challenges
posed by DL. This is particularly vital in critical systems with safety implications for users and the environment.
This presentation will share findings from years of research on the automated and
practical testing of DL models and DL-enabled systems. It will also cover work on
testing-based safety analysis as a significant application of testing.