Summary
Liquid-Liquid Phase Separation (LLPS)
Thermodynamically-driven reversible de-mixing of (binary) solution into 2 different liquid phases
Many molecules tend to stick together with others of their own kind
Higher temperature -> molecules are forced to jump around the substance, away from their own kind, don’t form clumps
Higher density -> many travel paths are blocked, makes it more likely that molecules stay in place, form clumps (liquid droplets) of their own kind
Key interest: protein solutions
This process occurs in cells, create compartments/droplets within them
There are many biological processes that involve packaging of many protein molecules into droplets: Biological Condensates
Major drivers
Intrinsically Disordered Proteins (IDPs)
RNA
Intrinsically Disordered Proteins
Lack stable tertiary structure
⅓ of human genome, ⅔ of cancer-associated proteins
Described as dynamics and likely heterogeneous assemblies
Fold upon binding, self-assembly, phase separation
Physics-based Molecular Modeling and Simulation
Components:
Energy Function (force field)
Calculation of Dynamics (sampling)
Can be modeled at many levels of detail/abstraction
Classical Energy functions from underlying quantum dynamics
Dynamics:
Molecular Dynamics (MD): mechanistic motion of molecules
Monte Carlo sampling of likelihood of various structural moves based on transition energy
MD tends to be more efficient
Energy function -> Trajectory of conformation (probability distribution of possibilities) -> Free energy of each conformation -> Thermodynamic properties
But, extremely expensive
Timestep 1-2 fs (target timescale >> μs)
>>105 atoms
Molecular simulations of LLPS: Cα-model
Take a protein, focus on the shape of its amino acid backbone
Break it up into a sequence of segments that are treated as coarse “balls”
Interactions between these balls is modeled like in molecular dynamics, just coarser
Choice of this coarsening is the key challenge: major focus of research: manual/intuition, physics-based, data-driven
CALVADOS protein model
Bayesian Parameter-Learning Procedure
Chose different coarsenings, try them out in simulation, until they find the interaction parameters that consistently produces accurate approximations of coarse model vs fine-grained model
ALBATROSS: ML Prediction of IDP Dimensions
Train a Recurrent Neural Net on a Cα-model
Use the neural network for design
idpSAM: GEnerative Modeling of IDP Ensembles
Use atomic model to generate many ensembles
Train generative model
Generate realistic new ensembles from model -> sample conformational space
Major challenge:
Limited accuracy of trajectories from atomistic simulation because it is so expensive to run at full accuracy
Opportunity to use ML to speed up atomistic simulations
Protein Backbone Structures and Interactions
IDPS are not always random coils!
Transient structures are an important part of how consensates of IDPs form
Ongoing work: Hybrid Resolution Protein Model