Summary
Focus: AI/ML for medical applications using imaging and multi-omics
Project 1: Can ML cause a reproducibility crisis in science?
Analyzed prior modeling research on Alzeheimer’s disease
Identified many cases of data leakage in their analysis
Trained CNNs on the imaging data,
Separated:
training/validation from
independent test, which was only used after the 2nd round of peer review
Compared 3D CNN and showed that the accuracy is similar to a simpler linear SVM
Evaluated extrapolation accuracy across different datasets, showing reduced accuracy when the data distribution shifts
Demonstrated that
Splitting data across images but not across patients results in overfitting;
Robust models require splitting across patients: all images from one patient are either train or validation, not split across both
Project 2: Heterogeneous dynamics of disease
Using multiple data modalities: MRI, genetics
Clustering disease trajectory signatures
MAGIC: Multi-scale heterogeneity analysis and clustering
Clustering of brain images to identify disease subtypes
Spatially breaking down brain into feature clusters, with different degrees of spatial resolution
Looking at late-life depression
Projecting measurements into a low-dimensional subspace
Tracking patients’ disease progress over time
GAN-based methods for mapping disease heterogeneity
The discriminator compresses imaging data into a low-dimensional latent space
Clustered these latent vectors, which encode the propagation of dementia into 5 subtypes based on brain imaging and symptoms
Subtypes have distinct genetic markers
Project 3:
Link imaging with genetics
MuSIC: Multi-scale structural imaging covariance atlas
Genome-wide association between genetic features and spatial clusters in images
Genetic architecture of multi-modal brain age
Project 4: multi-omics and multi-organ modeling of human aging and disease
Associating 9 phenotype-base aging clocks
Relating phenotypic and generic correlation between different disease phenotypes
Often related
However, where environmental factors are dominant these correlations may have different directionalities
Biological aging clocks
Idea:
Train model to predict chronological age from some type of feature
Look at which features are predictive
11 proteome-based ProBAGs
Plasma proteomics data
Can do age-bias correction
Models tends to regress towards mean age (~45 yo)
Messes up results for diseased patients (e.g. predicted age of diseased patients is younger than chronological)
Correction methods undo this bias by explicitly conditioning on diseased/healthy populations
Organ-specific aging clocks
Observation: prediction of clinical age is not that useful; need to predict actual disease state/diagnosis