Summary:
Focus: deep generative models of diseases and their dynamics
Can we simulate diseases?
Can we represent cells in-silico?
Multi-omic variation: Genome, Epigenome, Proteome, Transcroptome, Metabolome
Models:
Early: PCA
Current: Autoencoders, Foundation Models
Next frontier: leverage new under-explored data sources
SOLPHIN
Exons: functional regions of a given gene
Transcribed to portions of proteins
A single gene can decode to multiple proteins depending on which exons are decoded
DNA->RNA: all exons
RNA->mRNA: different exons are spliced into specific mRNAs, which then are decoded into proteins
We read mRNAs and can read the chosen exons and junctions between them
Deep Generative model: encode gene and exon data into embedding
Aggregation of of Junction Reads
Downstream analysis
Cell embedding
Exon-level marker
Alternative splicing
Using exon data makes it possible to detect pancreatic cancer markers missed by gene-count methods
MATES: quantifies locus-specific transposable elements in single-cell data
Multi-omic cell representation learning
Integrated modalities: scRNA-seq, snmC-seq, scATAC-seq
Encoded into a latent space and combined
Enables single-cell cross-modal generation
Given some modalities, generate others
Single-cell genomics is very expensive
$1.5m for 100 samples
Vs Bulk sequencing: $18k for 100 samples
Can we generate single-cell from bulk?
Approach: cross-modal generative model
How to represent disease progression in-silico?
Time series with sparse snapshot make it hard to understand evolution
Need fine temporal resolution
Trying to represent changes in gene expression over time under different conditions
Given virtual disease model, evaluate impacts of virtual drugs to find ways to bring diseased cells to healthy state
Prior methods based on public data, which is limited and require supervised labeling
Want a disease-specific unsupervised model
UNAGI model: https://github.com/mcgilldinglab/UNAGI
Data:
10 healthy donors
9 diseased
231,544 cells
Virtual cell: deep generative model learns cell embedding
Virtual disease: dynamics graph of disease progression in embedding space
Identifies the genes that drive disease progression
Impact of virtual drugs on cells
Model the impact of drugs on changes in gene expression
Apply these changes to the disease progression model
Model based single-cell data from patients with IPF(Idiopathic pulmonary fibrosis)
Model validated using experimental perturbations
Applied model to predict which drugs are most likely to treat disease
Identified several drugs that are likely to be effective
Tested one candidate (effective and cheap) by applying drug to diseased cells
The cells showed reduction of disease symptoms, which were close to what the model predicted