Hi all,
This week we will have a special double header theory talk. First, we will have a guest speaker, David Zheng from UIUC at 11AM, then Chandra Sekhar Mukherjee from our group will give his quals presentation at 12:30. We will have a 30 minute break in between the talks for lunch. Please feel free to attend either or both talks. The abstracts for the talks are below. I will request that the Ginsburg staff let anyone asking about theory lunch from 11AM-1PM up to the 5th floor, as last time people seemed to get upstairs faster than I could let them in. If you have any trouble getting in, please feel free to shoot me an email (
Grayso...@gmail.com).
Thanks,
Grayson
Talk 1, 11AM, David Zheng:
Title: VC dimension and their applications in topological and geometric graphs
Abstract: Minor-free graphs and geometric intersection graphs are two classes of graphs that share a surprisingly geometric property: they have very natural set systems that have bounded VC dimension. We can algorithmically exploit these set systems to design algorithms and data structures not possible in general graphs such as subquadratic time algorithms for diameter and subquadratic space data structures for distance oracles as well as reachability oracles in directed graphs.
Talk based on joint work with Adam Karczmarz (
https://arxiv.org/abs/2410.12003), and ongoing work with Timothy Chan, Hsien-Chih Chang, and others.
Talk 2, 12:30 PM, Chandra Mukherjee:
Title: A multi-core-periphery perspective: Balanced ranking and improved clustering
Abstract: Structural assumptions serve as fundamental guiding principles in the inference of unlabeled data. Two of the most well-known assumptions in graphs are the community structure and core-periphery structure. While individually extensively studied, the coexistence of these structures in the same graph remains under-explored. In this talk, we propose a novel structural assumption via a natural combination of these two structures that we denote as the ``multi-core periphery structure with communities'' (MCPC) that better captures graphs with underlying groups.
First, we observe that this structure captures certain unbalancedness (bias) in popular ranking algorithms (such as PageRank and other centrality measures), which we theoretically justify. We motivate the need for balanced ranking and design a new class of algorithms (that we call relative centrality) to overcome this issue, which we also theoretically and experimentally verify in a probabilistic graph model.
As a concrete application of the balanced ranking, we first use graph embeddings of biological data with underlying communities (cell types). We show that the top-ranked points selected by our algorithm are significantly better clusterable using traditional clustering algorithms while being more balanced (as opposed to the traditional centrality measures, the top-ranked points represent points from all underlying communities).
Next, we observe that MCPC structure emerges in graph embeddings of data with underlying communities from many different domains, such as bulk-RNA data, image data, and document datasets. Motivated by this observation and the improved clusterability of the top-ranked points, we develop a novel clustering pipeline, observing promising improvements in popular existing clustering algorithms (such as K-Means, HDBSCAN, and Louvain). In doing so, we develop a new "densification" idea that may be of independent interest.
The ranking algorithm is a joint work with Jiapeng Zhang. The clustering algorithm is an ongoing work with Joonyoung (Aaron) Bae and Jiapeng Zhang.