Please join us for Lunch in Theory this Thursday 9/25 at 12:00 PM in
MCB 102.
Reminder: please bring your own lunch, as lunch will not be provided.
This week we have Kabir Verchand talking about estimation with missing data. Note that this talk will be co-organized with the ECE AIF4S seminar this week and hence we have a change in the venue to MCB102 for this week only.
Title: Estimation with missing data: Moving beyond missing (completely) at random
Abstract: Modern data pipelines are growing both in size and complexity, introducing tradeoffs across various aspects of the associated learning challenges. A key source of complexity, and the focus of this talk, is missing data. Estimators designed to handle missingness often rely on strong assumptions about the mechanism by which data is missing, such as that the data is missing completely at random (MCAR). By contrast, real data is rarely MCAR. In the absence of these strong assumptions, can we still trust these estimators?
In this talk, I will present a framework that bridges the gap between the MCAR and assumption-free settings. This framework reveals an inherent tradeoff between estimation accuracy and robustness to modeling assumptions. Focusing on the fundamental task of mean estimation, I will then present estimators which optimally navigate this tradeoff, offering both improved robustness and performance. Along the way, we will crucially leverage and build upon recent results in algorithmic robust statistics.
Bio: Kabir Aladin Verchand is an assistant professor in the Data Sciences and Operations department at USC. Before joining USC, he received a BS in Electrical Engineering and Computer Science from UC Berkeley and a PhD in Electrical Engineering from Stanford University, followed by postdoctoral appointments in the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge and the Department of Industrial and Systems Engineering at Georgia Tech. His work lies at the intersection of optimization, machine learning, and statistics, and he is broadly interested in understanding the fundamental limits of what can be learned from data and how to achieve those limits. His work was selected as the runner-up for the Best Paper Prize for Young Researchers in Continuous Optimization at the International Conference on Continuous Optimization (2022).