Reassigning $WEST_PCOORD_RETURN outside runseg.sh

23 views
Skip to first unread message

Harry Ryu

unread,
Aug 27, 2025, 10:59:23 AM (9 days ago) Aug 27
to westpa-users
Hello everyone,

In my current implementation of WESTPA, the pcoord ($WEST_PCOORD_RETURN) is assigned within runseg.sh. But this is not what I want as my pcoord calculation is expensive, especially to do it for every trajectory by initiating a custom Python script. Another important detail is that I want to calculate binning based on these pcoord values.

I want to switch to calculating the pcoord in batch. My current idea is shown below:

a) After running the regular dynamics, assign some dummy pcoord as a placeholder
b) Calculate the actual pccord in batch (for all trajectories within an iteration using a separate Python script which will be called in post_iter.sh)
c) Update the pcoord (in h5 or $WEST_PCOORD_RETURN?) with real values
d) Assign bins for the next iteration based on the updated pcoord values

Would something like this be possible to implement using the current WESTPA workflow?

Thank you in advance and please let me know if I can provide more detail.

Best,
Harry

Jeremy Leung

unread,
Aug 27, 2025, 11:28:36 AM (9 days ago) Aug 27
to westpa-users
Hi Harry,

Yes. This is doable. But often times, this will be slower because you're doing this in one process (the "main" westpa process), instead of parallelizing this across all your workers. We see this slowdown very prominently if you're working on the HDF5 framework with a huge system, for example. If you do get a speedup, the most optimal way is to offload a huge part to the workers (scraping the coordinates etc.) and have the main process do the bare minimal.

1. First thing is you need to do is to make sure you assign bins to segments all at once at the end of an iteration (see the MAB code)
2. Probably do the calculations in the bin mapper? (See older mab code on how to pull things out. Having external files for sim manager/driver would be the easiest to debug and what not)
3. You can pull the data manager to access the per-iter information, since you need to write (see DL-enhancedWE code). `import westpa; westpa.rc.sim_manager.data_manager ...`

The following code should be useful reference.

More complicated example: 

Easier example to copy (MAB):

Example of me trying to speed up the hdf5 framework, which requires offloading a majority of the coordinate extraction to each worker. I think the balance here is worth looking at:

-- JL

John Russo

unread,
Aug 27, 2025, 9:35:40 PM (9 days ago) Aug 27
to westpa-users

Leung, Jeremy

unread,
Aug 28, 2025, 12:54:13 PM (8 days ago) Aug 28
to westpa...@googlegroups.com
Some great points from John (nice to hear from you!).

One last thing I forgot to mention: if you want to do the analysis/calculations during dynamics propagation then you can try out the streaming support PR w/ MDAnalysis. NAMD/GROMACS/LAMMPS supports IMDv3 protocol that allows you to stream out the coordinates while the simulation is running. Obviously still WIP (both on the IMDClient and the WESTPA implementation), but another option there (https://imdclient.readthedocs.io/en/latest/).

-- JL

---
Jeremy M. G. Leung, PhD
Postdoctoral Associate, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

On Aug 27, 2025, at 9:35 PM, John Russo <jdr...@gmail.com> wrote:

You don't often get email from jdr...@gmail.com. Learn why this is important
--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/westpa-users/fc517b19-f93c-4dd0-9205-3dae567f1aa4n%40googlegroups.com.

Daniel Zuckerman

unread,
Aug 28, 2025, 2:30:41 PM (8 days ago) Aug 28
to westpa...@googlegroups.com
Thanks everyone for these suggestions (which are all way over my head!).  --Dan

From: 'Leung, Jeremy' via westpa-users <westpa...@googlegroups.com>
Sent: Thursday, August 28, 2025 9:54 AM
To: westpa...@googlegroups.com <westpa...@googlegroups.com>
Subject: [EXTERNAL] Re: [westpa-users] Reassigning $WEST_PCOORD_RETURN outside runseg.sh
 

Harry Ryu

unread,
Aug 28, 2025, 2:45:14 PM (8 days ago) Aug 28
to westpa-users
Hi John, glad to hear from you as well and hope you're doing well in SD!

Thank you for the suggestions. I'll have to think about the different ideas from this thread and see how they can be implemented. 

Best,
Harry 

2025년 8월 28일 목요일 오후 1시 30분 41초 UTC-5에 Daniel Zuckerman님이 작성:

Hayden Scheiber

unread,
Aug 28, 2025, 4:53:24 PM (8 days ago) Aug 28
to westpa-users
Hi Harry,

I've implementing something similar to this in my research on weighted ensemble simulations. I think your most straightforward and flexible option is what Jeremy already said: to perform your pcoord calculation directly in a custom Bin Mapper. This will ensure you are using your batched pcoords to assign bins as you see fit, and you can build your pcoords and assign them to bins entirely in memory without saving to file. The downside is that you will have to define your binning strategy explicitly within the custom bin mapper, you won't be able to use the included WESTPA strategies.

The bin mapping is performed from the main WESTPA thread at the end of each iteration, after all walkers are finished their dynamics. At this stage you can be assured all necessary simulation data will be available, and you can assign whatever computational resources are available on the node to facilitate your pcoord calculation in a single batch, without worrying about it being used by active workers. Further, there is nothing stopping you from overwriting the saved dummy pcoords in the west.h5 file at this stage.

In my use-case I am calculating meaningful "raw" pcoords on-the-fly in runseg.sh, and then learning a transformation on the pcoords during the binning stage before binning them. 
In your case it sounds like you'd be pulling trajectory data directly from the raw outputs (ideally saved within per-iteration h5 files for fast simplified I/O) to compute pcoords all at once in a batch.

Hope this helps,

Hayden

Reply all
Reply to author
Forward
0 new messages