Lunch In Theory This Thursday (12:00, 11/20, GCS 502c)

6 views

Skip to first unread message

Devansh Gupta

unread,

Nov 19, 2025, 4:03:15 PM (9 days ago) Nov 19

to CS Theory Group, usc-t...@googlegroups.com, USC Theory Group, Alimohammadi, Yeganeh

Hi all,

Please join us for Lunch in Theory this Thursday, 11/20 at 12:00 in GCS 502c.

Reminder: Please bring your own lunch, as lunch will not be provided.

This week we have Chandra Shekhar Mukherjee, giving his thesis proposal. Please find the title and abstract attached.

We look forward to having you all in the talk.

Best,

Devansh

Title: Characterizing and Exploiting the Interplay of Structure and Learnability in Unsupervised and Semi-Supervised Learning

Abstract: Assumptions about geometric and distributional structure are ubiquitous in unsupervised learning. This thesis characterizes how such underlying structures affect learnability and develops algorithms that exploit these structures across both theoretical and applied settings.

The first part of the thesis focuses on community detection in the stochastic block model (SBM), a canonical model for graphs with latent communities. Here, our work proposed the first parameter-free community recovery algorithm with near-optimal performance, while partially resolving conjectures of Van Vu and Emmanuel Abbe on the behavior of standard spectral algorithms (NeurIPS 2023, SODA 2024).

Motivated by the limitations of the SBM in modeling real data, the second part examines graphs arising from single-cell RNA-seq (scRNA-seq). These graphs deviate substantially from classical assumptions. We identify a novel organization—multi-core periphery structure with communities (MCPC)—and introduce the concept of relative centrality to extract low-variance representatives of each community. This structure appears consistently across diverse scRNA-seq datasets and improves downstream biological inference (ICLR 2025).

Building on these observations, the final part identifies a novel density–geometry correlation in real datasets with ground-truth labels: dense regions exhibit smoother, more linear geometry, while sparse regions show greater non-linearity and overlap. This phenomenon recurs across scRNA-seq and bulk RNA-seq data, image datasets, self-supervised embeddings, and the learned representation spaces of semi-supervised models. We observe that performance of a large category of algorithms strongly correlates with this organization—unsupervised and semi-supervised methods succeed in dense, geometrically stable regions and degrade in sparser layers.

These findings lead to three main applications:

(1) a layer-extraction, clustering, and visualization framework for scRNA-seq that isolates stable transcriptomic states;

(2) a clustering enhancement framework that boosts simple algorithms such as K-Means and HDBSCAN to compete with state-of-the-art manifold methods at far lower cost;

(3) geometry-aware improvements to CNN-based semi-supervised learning on standard image benchmarks.

Together, these results advance a unified view of how density and geometry govern learnability.

Devansh Gupta

unread,

Nov 20, 2025, 2:59:11 PM (8 days ago) Nov 20

to CS Theory Group, usc-t...@googlegroups.com, USC Theory Group, Alimohammadi, Yeganeh

Hi all,

Please find the zoom link attached: https://usc.zoom.us/j/7808315301.

Best,

Devansh

Reply all

Reply to author

Forward

0 new messages