Machine learning methods for peptide, protein and antibody design | 9am Tues, Nov 7

37 views
Skip to first unread message

Grigory Bronevetsky

unread,
Nov 3, 2023, 6:38:36 PM11/3/23
to ta...@modelingtalks.org

image.pngModeling Talks

Machine learning methods for peptide, protein and antibody design

Philip Kim @ UToronto

image.png

Tuesday, Nov 7 | 9am PT

Meet | Youtube Stream


Hi all,


The presentation will be via Meet and all questions will be addressed there. If you cannot attend live, the event will be recorded and can be found afterward at
https://sites.google.com/modelingtalks.org/entry/machine-learning-methods-for-peptide-protein-and-antibody-design


Abstract:
I will cover machine learning methods developed in my lab that cover protein sequence design, de novo protein design and antibody design, making use of graph neural networks, and diffusion models. We also have developed methods to model the conformational dynamics of peptides using PepFlow, a sequence-conditioned boltzmann-generator model. We show that PepFlow achieves state-of-the-art performance for peptide structure prediction and matches experimental conformational ensembles.


Bio:
Philip M. Kim is a professor at the University of Toronto at the Donnelly Centre and the Departments of Computer Science and Molecular Genetics. In his academic research, he has been developing novel machine learning methods for protein and peptide engineering and authored over 90 publications, 7 invention disclosures and 5 patent applications. He has co-founded several biotechnology companies and serves as consultant and member of the scientific advisory board for others. Before setting up his lab in 2009, he was a postdoctoral fellow at Yale University and an associate with McKinsey & Co. He holds a Ph.D. from the Artificial Intelligence Laboratory and Department of Chemistry at the Massachusetts Institute of Technology and a B.S. in Physics and Biochemistry from the University of Tuebingen.


More information on previous and future talks: https://sites.google.com/modelingtalks.org/entry/home

Grigory Bronevetsky

unread,
Nov 24, 2023, 1:31:21 AM11/24/23
to Talks, Grigory Bronevetsky
Video Recording: https://youtu.be/2r3gyv660g8
Slides: https://drive.google.com/file/d/1ylE6HNrdXgvI2PIJ27SeH_O8boMCzTy2/view?usp=sharing

Summary

  • Today big data and compute hardware are powering AI systems to enable novel biological discoveries

  • Focus: structure-based de-novo design of protein-based therapeutics

    • Structure-based reasoning:

      • Efficient 3D structure representation of protein structure based on graphs

      • Attention/transformer models

      • Transfer learning

      • Diffusion models

    • Sequence-based reasoning

      • Protein language models

  • Fable therapeutics: application of lab’s research work

  • Graphs: natural way to represent molecular structure

    • Encode distance, bonds relationships between atoms

    • Do not require discretization

    • Alternative: 3d coordinates/voxels are not rotation invariant

    • Now superior approaches have been developed (e.g. Fable-RE)

  • PROTEINSOLVER: inverse folding using graph neural networks (https://www.sciencedirect.com/science/article/pii/S2405471220303276)

    • Inverse folding: find sequence that produces given 3d structure

    • Given a sequence and a training set of graphs

    • Infers the structural graph edges that imply the 3D structure

    • Newer methods: ESM-IF, ProteinMPNN (https://www.science.org/doi/10.1126/science.add2187)

  • PepNN: accurate predictor of peptide binding sites (https://www.nature.com/articles/s42003-022-03445-2)

    • Attention modules: updates protein and peptide embedding while enforcing symmetry

    • Final embedding layers to predict peptide binding site

    • Transfer learning across scarce related peptide-protein / antibody-antigen datasets

      • Use fragments of proteins that are likely to bind to expand size of dataset

  • De novo protein design

    • Design protein that satisfies desired properties

    • ProteinSGM: diffusion generative protein model 

    • Protein represented as graph matrices (distance, angles) that are treated as images

    • Almost all generated backbones are designable and lead to real proteins

    • Many are novel

    • Can create a protein Photoshop, where certain key regions are specified and the rest of the protein is generated

  • HelixGAN, HelixDiff: full atom peptide generation (https://pubmed.ncbi.nlm.nih.gov/36651657/)

  • Antibody-SGM: diffusion model for full antibody generation (https://icml-compbio.github.io/2023/papers/WCBICML2023_paper143.pdf)

  • Next frontier: reflecting dynamics in ML models

    • Proteins are flexible, adapt and move

    • Need methods that learn the conformational space

      • Physics models: map out the space of energy space: minima/maxima, etc.

      • Computationally very expensive to describe entire landscape even given the energy function

    • Boltzmann generators: use normalizing flows to sample from the molecule’s energy distribution

      • Generator model that generates the entire energy space distribution of the molecule’s conformations

    • PepFlow:

      • Structural model that learns various structural components of the model based on the sequence

      • Separate sub-models for backbone, sidechain heavy atoms, protonation

      • Generates space of allowable conformations of the sequence

    • Next stage: PepFlow++

      • Adapted PFGM++ architecture

      • Equiformer layers

  • Epigenetic editing

    • Zinc Finger proteins

      • Small proteins

      • Fully human proteins (no immune system reaction)

      • Can bind to DNA but the adjacent binding sites interact, which induces a combinatorial explosion of the solution space

    • Approach:

      • Used extensive experimental data to determine which zinc finger structures are compatible

      • Trained transformer-based language model to generate compatible finger sequences


Reply all
Reply to author
Forward
0 new messages