A
central question when designing an AI system in the real world is, "how
to learn representations of the data that make it easier to extract
useful information when building classifiers or other predictors?"
Satisfactory answers are attainable for certain data, such as images,
texts, or audio. However, various types of data are gathered as graphs,
such as social networks, protein interaction networks, brain
connectomes, etc. Despite the emerging and powerful graph neural network
techniques, researchers are yet unanimous in the answer dedicated to
learning embeddings from graphs due to their high irregularity,
complexity, and sparsity. I have been focusing on addressing the
critical challenges in graph representation learning.
In
this talk, I will first introduce my recent research results on solving
two key problems in graph representation learning. i) A significant
limitation of the famous graph convolutional network is over-smoothed
embeddings with deeper networks. ii) The scalability to big data, though
facilitated by self-supervised pretraining, loses the focus on local
structure. After that, I will expand the horizons beyond the nodes and
discuss how they interact with the algorithm design. i) Graph-level
embeddings are desired instead of node-level embeddings in some
applications. ii) Data distribution implies graph structure---even if it
is not explicitly given. Besides addressing the efficacy of
representation learning, I also designed fairness-aware machine learning
algorithms to tackle the bias in model training and data processing. I
applied my new representation learning methods to successfully solve
various real-world applications, such as brain disease early diagnosis,
drug repositioning, and social media network predictions.