Machine learning methods for peptide, protein and antibody design | 9am Tues, Nov 7

37 views

Skip to first unread message

Grigory Bronevetsky

unread,

Nov 3, 2023, 6:38:36 PM11/3/23

to ta...@modelingtalks.org

Modeling Talks

Machine learning methods for peptide, protein and antibody design

Philip Kim @ UToronto

Tuesday, Nov 7 | 9am PT

Meet | Youtube Stream

Hi all,

The presentation will be via Meet and all questions will be addressed there. If you cannot attend live, the event will be recorded and can be found afterward at
https://sites.google.com/modelingtalks.org/entry/machine-learning-methods-for-peptide-protein-and-antibody-design

Abstract:
I will cover machine learning methods developed in my lab that cover protein sequence design, de novo protein design and antibody design, making use of graph neural networks, and diffusion models. We also have developed methods to model the conformational dynamics of peptides using PepFlow, a sequence-conditioned boltzmann-generator model. We show that PepFlow achieves state-of-the-art performance for peptide structure prediction and matches experimental conformational ensembles.

Bio:
Philip M. Kim is a professor at the University of Toronto at the Donnelly Centre and the Departments of Computer Science and Molecular Genetics. In his academic research, he has been developing novel machine learning methods for protein and peptide engineering and authored over 90 publications, 7 invention disclosures and 5 patent applications. He has co-founded several biotechnology companies and serves as consultant and member of the scientific advisory board for others. Before setting up his lab in 2009, he was a postdoctoral fellow at Yale University and an associate with McKinsey & Co. He holds a Ph.D. from the Artificial Intelligence Laboratory and Department of Chemistry at the Massachusetts Institute of Technology and a B.S. in Physics and Biochemistry from the University of Tuebingen.

More information on previous and future talks: https://sites.google.com/modelingtalks.org/entry/home

Grigory Bronevetsky

unread,

Nov 24, 2023, 1:31:21 AM11/24/23

to Talks, Grigory Bronevetsky

Video Recording: https://youtu.be/2r3gyv660g8
Slides: https://drive.google.com/file/d/1ylE6HNrdXgvI2PIJ27SeH_O8boMCzTy2/view?usp=sharing

Summary

Today big data and compute hardware are powering AI systems to enable novel biological discoveries
Focus: structure-based de-novo design of protein-based therapeutics
- Structure-based reasoning:
  - Efficient 3D structure representation of protein structure based on graphs
  - Attention/transformer models
  - Transfer learning
  - Diffusion models
- Sequence-based reasoning
  - Protein language models
Fable therapeutics: application of lab’s research work
Graphs: natural way to represent molecular structure
- Encode distance, bonds relationships between atoms
- Do not require discretization
- Alternative: 3d coordinates/voxels are not rotation invariant
- Now superior approaches have been developed (e.g. Fable-RE)
PROTEINSOLVER: inverse folding using graph neural networks (https://www.sciencedirect.com/science/article/pii/S2405471220303276)
- Inverse folding: find sequence that produces given 3d structure
- Given a sequence and a training set of graphs
- Infers the structural graph edges that imply the 3D structure
- Newer methods: ESM-IF, ProteinMPNN (https://www.science.org/doi/10.1126/science.add2187)
PepNN: accurate predictor of peptide binding sites (https://www.nature.com/articles/s42003-022-03445-2)
- Attention modules: updates protein and peptide embedding while enforcing symmetry
- Final embedding layers to predict peptide binding site
- Transfer learning across scarce related peptide-protein / antibody-antigen datasets
  - Use fragments of proteins that are likely to bind to expand size of dataset
De novo protein design
- Design protein that satisfies desired properties
- ProteinSGM: diffusion generative protein model
- Protein represented as graph matrices (distance, angles) that are treated as images
- Almost all generated backbones are designable and lead to real proteins
- Many are novel
- Can create a protein Photoshop, where certain key regions are specified and the rest of the protein is generated
HelixGAN, HelixDiff: full atom peptide generation (https://pubmed.ncbi.nlm.nih.gov/36651657/)
Antibody-SGM: diffusion model for full antibody generation (https://icml-compbio.github.io/2023/papers/WCBICML2023_paper143.pdf)
Next frontier: reflecting dynamics in ML models
- Proteins are flexible, adapt and move
- Need methods that learn the conformational space
  - Physics models: map out the space of energy space: minima/maxima, etc.
  - Computationally very expensive to describe entire landscape even given the energy function
- Boltzmann generators: use normalizing flows to sample from the molecule’s energy distribution
  - Generator model that generates the entire energy space distribution of the molecule’s conformations
- PepFlow:
  - Structural model that learns various structural components of the model based on the sequence
  - Separate sub-models for backbone, sidechain heavy atoms, protonation
  - Generates space of allowable conformations of the sequence
- Next stage: PepFlow++
  - Adapted PFGM++ architecture
  - Equiformer layers
Epigenetic editing
- Zinc Finger proteins
  - Small proteins
  - Fully human proteins (no immune system reaction)
  - Can bind to DNA but the adjacent binding sites interact, which induces a combinatorial explosion of the solution space
- Approach:
  - Used extensive experimental data to determine which zinc finger structures are compatible
  - Trained transformer-based language model to generate compatible finger sequences

Reply all

Reply to author

Forward

0 new messages