Notes from Yanfu presentationToday

2 views
Skip to first unread message

Data science RiP

unread,
Nov 13, 2025, 6:18:30β€―PMNov 13
to Data science RiP

πŸ“ Research-in-Progress (RiP): Monthly Data Science Seminar - Dr. Yanfu Zhang (William & Mary College) Notes

πŸ•ž Started at 2:56PM on 13 Nov, lasted 1h 9m


Key Points
  • Avinash Sahu introduced Dr. Yanfu Zhang as an assistant professor in Computer Science at William & Mary College who received his Ph.D. in Electrical and Computer Engineering from UTD under Dr. Huang's supervision.
  • Collaboration between the teams has produced several published papers focusing on heterogeneous graph neural networks and EHR genomic information using large language models.
  • Graph representation learning addresses the challenge of transforming irregular graph data into readable formats for predictors, unlike regular structures in images and text that use convolution and attention mechanisms.
  • Over-smoothing problem in graph convolutional networks occurs when multiple layers cause all nodes to have indistinguishable representations due to repeated low-pass filtering effects.
  • Continuous graph neural networks using second-order dynamics were proposed to solve the over-smoothing limitation by incorporating velocity and acceleration terms similar to a spring-mass system.
  • Second-order differential equations prevent nodes from overlapping in representation space, ensuring different final representations and avoiding the over-smoothing problem that affects first-order diffusion processes.
  • Self-supervised learning approach using subgraph-level proximity was developed to handle unlabeled big data by comparing node similarities within sampled subgraphs rather than relying on manual first-order or second-order proximity definitions.
  • Wasserstein distance computation enables end-to-end trainable optimization for comparing subgraph representations through differentiable convex optimization and KKT conditions.
  • Brain connectome analysis leverages small-world property of neural networks to predict depression scores, with random graph generators serving as auxiliary tasks for graph neural network training.
  • Graph-level representations for brain connectivity data showed improved performance when preserving small-world properties compared to fully connected graph approaches in depression prediction tasks.
  • Deep metric learning for image retrieval was reformulated as a graph construction problem where images become nodes and similarity relationships form edges between same-category clusters.
  • Unified loss function combining pairwise and proxy-based methods achieved superior performance across CUB, SOP, and In-Shop datasets by approximating pairwise loss while maintaining computational efficiency.
  • Transformer architectures avoid over-smoothing in language processing because each input creates a different fully connected graph context, unlike fixed graph structures in traditional graph neural networks.
  • Future research directions include integrating graph neural network principles with large language models and developing graph foundation models for single-cell data analysis applications.


Summary

Research-in-Progress Seminar Introduction and Background

  • Dr. Yanfu Zhang was introduced as an assistant professor in the Computer Science Department at William & Mary College who received his Ph.D. in Electrical and Computer Engineering from UTD under Dr. Huang's supervision

  • His research focuses on machine learning, data mining, natural language processing, and computer vision with recent biomedical applications
  • Collaboration with UNM involves heterogeneous graph neural networks and EHR genomic information using large language models
  • Multiple papers have been published together with exciting ongoing work in development.


AI Research Programs and Collaboration Opportunities

  • A research symposium on main campus featured AI talks and health science AI presentations that some participants were unaware of

  • Chris Amos mentioned awareness of the event but had scheduling conflicts preventing attendance
  • A $3 million grant was received for creating a fellowship program that recruited approximately ten students in their first cohort
  • Certificate programs in AI are being developed that could potentially be cross-listed or shared between institutions
  • Spring semester launch is being considered instead of the originally planned summer start for new programs.


Representation Learning Fundamentals and Applications

  • Machine learning has achieved remarkable success across various domains including AlphaGo beating human champions, AlphaFold predicting protein structures, and ChatGPT demonstrating powerful language capabilities

  • Common feature of successful approaches involves task-specific architectures where simple predictors work with complex representation learning components
  • Benefits include allowing researchers to focus on specific representation learning problems rather than exploiting theoretical predictor structures
  • Examples demonstrate how humans process visual information by creating representations that enable easy question answering about images.


Graph Neural Networks and Their Challenges

  • Graphs serve as powerful tools for describing diverse data forms including social networks, brain connections, protein interactions, and even images and text as special graph types

  • Major applications include node classification, link prediction, and community detection for understanding complex relationships
  • Irregularity presents the primary challenge as graphs lack the regular structure that enables convolution and attention mechanisms in images and text
  • Combining neighbor information with node features while handling different numbers and types of edges creates significant algorithmic design difficulties.


Graph Convolution Networks and Over-smoothing Problem

  • Graph convolution networks simplify complex spectral graph convolution through first-order approximation using normalized Laplacian operations

  • This approach effectively averages node features while adding activation functions, significantly reducing computational complexity compared to spectral methods
  • Over-smoothing occurs when multiple layers act as repeated low-pass filters, causing high-frequency signals to drop to zero
  • Experimental evidence shows model performance decreasing rapidly as layer depth increases, contradicting typical deep learning benefits.


Continuous Graph Neural Networks and Second-Order Dynamics

  • Continuous neural networks replace discrete layers with differential equations, allowing infinite depth through integral formulations

  • Graph-specific continuous networks use partial differential equations that consider both feature changes and graph structure
  • Heat diffusion processes still cause over-smoothing as they represent Gaussian kernels leading to representation blurring
  • Second-order dynamics using spring-mass systems prevent node overlap by maintaining different steady-state positions for connected nodes.


Self-Supervised Learning and Graph Contrastive Methods

  • Self-supervised learning addresses the challenge of limited labeled data by defining pretext tasks for representation learning

  • Graph contrastive learning constructs representations by learning similarity and dissimilarity between node subgraphs
  • Traditional proximity measures like first-order and second-order connections cannot cover all node comparison cases effectively
  • Subgraph-level proximity uses graph matching to compare similarity between nodes within defined neighborhood ranges.


Scalability Solutions and Wasserstein Distance Integration

  • Big graph data with millions of nodes and edges cannot be processed using traditional matrix multiplication approaches

  • Self-supervised methods enable pre-training without labels followed by fine-tuning on downstream tasks
  • Wasserstein distance computation provides flexible node matching between subgraphs of different sizes through weighted similarity optimization
  • End-to-end trainable approaches using differentiable optimization layers avoid iterative solving of the matching problem.


Graph-Level Representation Learning for Brain Connectomes

  • Brain connectomes represent connections between different brain regions and require graph-level rather than node-level analysis

  • Small-world property characterizes brain networks through shorter paths between nodes similar to random graph generators
  • Graph neural networks learn node representations that are pooled to create graph-level representations for predicting neural conditions
  • Auxiliary tasks using random graph parameter estimation help preserve important structural properties during representation learning.


Deep Metric Learning and Graph Construction

  • Image retrieval problems can be formulated as graph construction where similar images form dense clusters

  • Each image becomes a node with edges representing similarity, creating ideal community structures for different categories
  • Pairwise methods consider positive and negative pairs while proxy-based methods use average category representations
  • Unified graph-based loss functions combine benefits of both approaches, achieving accuracy of pairwise methods with speed of proxy-based methods.


Integration with Large Language Models and Future Directions

  • Transformers in language processing operate on fully connected graphs but avoid over-smoothing due to varying contexts and large model sizes

  • Over-smoothing may still occur with very long contexts when information becomes saturated across attention layers
  • Graph foundation models present opportunities for integration with fine-tuning approaches and multi-modal data modeling
  • Single-cell data analysis represents an emerging application area with significant potential for graph neural network methods.


Next Steps

  • Continued collaboration between UNM and William & Mary will focus on addressing graph neural network challenges in biomedical applications

  • Integration of graph methods with large language models will be explored for enhanced performance
  • Certificate program development and course cross-listing opportunities will be pursued for spring semester implementation.

Reply all
Reply to author
Forward
0 new messages