Predicting computational costs

Terra Sztain-pedone

unread,

Apr 3, 2020, 6:48:30 PM4/3/20

to westpa-users

Hello,

I am trying to predict how many total iterations my system will require. Looking at the p53-MDM2 system of ~10,000 atoms, tau = 50 ps, 400 iterations were required.

I would like to work with a couple significantly larger systems. The first, smallest system is ~75,000 atoms. I am using a tau = 50 ps, 20 walkers per bin, and have 90 bins. I have tried a few binding schemes, 1d and 2d and I find that I can reach 20% fairly quickly (within a few iterations) but then remain there for up to 20 iterations. I would also like to work with a system that is around 1 million atoms soon ...

with 20%, 18 bins x 20 walkers = 360 simulations I get 36 CPU days per iteration.

Therefore with 90 bins x 20 walkers = 1800 simulations I will likely require 180 CPU days per iteration

I wonder if I should assume 400 iterations? x 180 = 72,000 CPU days

however I will not have maximum bins for most of my iterations so maybe I should assume 36 CPU days x 200 iterations + 180 CPU days x 200 iterations 43,200 CPU days?

Let me know if you have a better idea about this or a resource on system size vs iterations

Thank you!!!

Terra

Daniel Zuckerman

unread,

Apr 3, 2020, 8:14:38 PM4/3/20

to westpa...@googlegroups.com

Terra, I can make a few general comments about this.

First, it makes sense to start from params used in the literature, but you should experiment. Also, if I understand correctly, your system seems to be getting stuck at a certain point. You can try using finer/different/nested bins at the chokepoint. In general the bins are the key thing that makes WE good or bad.

As to 'how many iterations' that will be very system-specific. And bear in mind that not every system will be amenable to good sampling with WE. See the WE overview on github: https://westpa.github.io/westpa/static/we_overview.pdf, especially the limitations section. You mention a million-atom system - I'd be skeptical that WE (or any method) could reach the necessary timescales for such a system. If you just want to see one instance of a process (which will probably be anomalous because the fastest events are not the most probable - see overview) then maybe WE can do it for you, but it won't be of much value in all likelihood.

You want to think about system timescales. To get some feel for relaxation timescales in complex systems, see our recent JACS paper on protein folding, Jeremy Copperman's preprint on arxiv.org and some recent posts ('exercises') on my blog.

I don't know if you're working with anyone experienced in WE calcs, but perhaps through this user group, you could try to find some folks to whom you could present some of your preliminary calcs and get some feedback. Better to get feedback sooner before doing very expensive calcs. Neither the art nor the science of WE calculation is trivial.

Hope that is helpful. --Dan Z

From: 'Terra Sztain-pedone' via westpa-users <westpa...@googlegroups.com>
Sent: Friday, April 3, 2020 3:48 PM
To: westpa-users
Subject: [westpa-users] Predicting computational costs

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/f3bf8ca2-962e-4f72-a93f-3416f4f9b0d1%40googlegroups.com.

Terra A Sztain-pedone

unread,

Apr 3, 2020, 8:55:20 PM4/3/20

to westpa...@googlegroups.com

Dan,

Thank you for your prompt response!

In terms of parameter experimentation, and after reading the limitations section, I don’t think going any lower than 50 ps would be a good idea, and higher would be more expensive. I think I will compare increasing the walkers to 30…

In terms of the iterations getting stuck, I have only gone up to 20 iterations so far and seem to go between 16-21% population. I’m not sure if this is the normal for this early on. I wonder if there is an expected bin population / iteration

For binning schemes I was originally using a 1D distance coordinate, though once I stayed at 20% from iteration 5-20, I chose a more complex coordinate

Fortunately I have a handful of structures in the initial and the final conformation so I did a PCA analysis and chose to do a coordinate based on PC1 vs PC2. Though I still maintain 20% between iteration 5-20 (again out of 400 this could be completely normal?)

My next plan is to use a progress coordinate of RMSD from initial structure vs RMSD from final structure. I plan to set this up over the weekend and think this could be more amenable to systems where multiple structures do not exist.

I recently saw the WE + MSM paper and will need to read that more thoroughly, as well as the additional resources, thank you.

We have a few WE experts in the lab but none that use WESTPA, it would be great to collaborate or get assistance in more detail!

Best,

Terra

To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/73e35d6d4c9a4c71aabfe2c64a0195a7%40ohsu.edu.

Lillian Chong

unread,

Apr 4, 2020, 5:52:05 PM4/4/20

to westpa...@googlegroups.com

Terra,

Based on my lab's efforts with protein binding simulations over the past several years, I can share with you some tips that are specific to running such simulations using the WE strategy.

I echo Dan's comments about the million-atom system that you have in mind -- using all-atom models, it would not be feasible to simulate binding events for such a large system. However, your other two systems would be feasible if the binding processes are on similar timescales as those of the p53/MDM2 and barnase/barstar system (multi-microseconds for ~1 mM receptor concentration).

If you haven't already, please check out my lab's 2019 Chemical Sciences paper explicit-solvent simulations involving the barnase/barstar binding process, which consisted of ~100,000 atoms and can be completed in ~10 days using 16 GPUs in parallel. I suggest starting with the smaller of your two systems and trying out the simulation protocol in this paper, including the binning scheme, which can be adjusted if needed as your simulation progresses.

Some key features of the simulation protocol:

* It is important to start your binding simulation from representative unbound conformations of the binding partners, i.e. well-equilibrated conformations, to ensure that the resulting binding pathways will be representative. To generate unbound conformations, we ran separate "preparatory" WE simulations of each binding partner (relevant scripts can be found here).

* It was necessary to use a 2D progress coordinate: In one dimension, we tracked a "binding" RMSD of the "anchor residues" (residues that become the most buried upon binding) in one binding partner after aligning on the other binding partner. The binding RMSD has been effective in discriminating between different relative orientations of the binding partners. In the other dimension, we tracked the minimum separation distance between the binding partners. This distance coordinate detects when the binding partners have collided and is essential if you want to be able to calculate the rate constants for each of the two steps in the binding process, including the formation of the collision complex.

Also, please carefully review the LiveCoMS suite of WESTPA tutorials, particularly Sections 1-5, which cover best practices as well as guidelines for choosing WE parameters, and the checklist for troubleshooting WE simulations.

Let me know if you have any further questions. We would be happy to provide feedback on your preliminary simulations to help refine your protocol.

All the best,

Lillian

To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/73e35d6d4c9a4c71aabfe2c64a0195a7%40ohsu.edu.

--

Lillian T. Chong
Associate Professor
Department of Chemistry
University of Pittsburgh
219 Parkman Avenue
Pittsburgh, PA 15260
(412) 624-6026

Terra A Sztain-pedone

unread,

Apr 7, 2020, 9:20:17 AM4/7/20

to westpa...@googlegroups.com

Lillian,

Thank you for the detailed response, and for making all your scripts so accessible!

By the way, most of the lab has redirected efforts towards COVID-19 simulations. I will be doing the enhanced sampling, and likely trying to apply WESTPA to some of our models. For the avoidance of redundancy, and the sake of collaboration it would be great to discuss the projects other WESTPA users are undertaking. Maybe I should start a separate topic on this, or we can chat more offline?

Best,

Terra Sztain

PhD Candidate, Chemistry and Biochemistry

Burkart & McCammon Groups

University of California, San Diego

tszt...@ucsd.edu

On Apr 4, 2020, at 2:52 PM, Lillian Chong <ltch...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/CAAHyvJsUcRVCjtJ2X3O7L3M66gw1r7WOSBq7Am7c79RUN3yB5A%40mail.gmail.com.

Lillian Chong

unread,

Apr 8, 2020, 8:58:59 AM4/8/20

to westpa...@googlegroups.com

Terra,

I would be happy to chat with you offline. In the interest of collaboration and avoiding redundancy, it's worth checking out the following websites for molecular modeling efforts that are already in progress from various groups around the world:

http://www.hecbiosim.ac.uk/covid-19-projects

https://www.deshawresearch.com/downloads/download_trajectory_sarscov2.cgi/

As you know, the Shaw group has shared some exciting results from weighted ensemble simulations involving SARS-COV2 attachment using their in-house code for Anton (see second link above).

All the best,

Lillian

To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/88DA4172-EC95-41AE-82E0-2B65E3BA8F91%40ucsd.edu.

Reply all

Reply to author

Forward