Reassigning $WEST_PCOORD_RETURN outside runseg.sh

Harry Ryu

unread,

Aug 27, 2025, 10:59:23 AM (10 days ago) Aug 27

to westpa-users

Hello everyone,

In my current implementation of WESTPA, the pcoord ($WEST_PCOORD_RETURN) is assigned within runseg.sh. But this is not what I want as my pcoord calculation is expensive, especially to do it for every trajectory by initiating a custom Python script. Another important detail is that I want to calculate binning based on these pcoord values.

I want to switch to calculating the pcoord in batch. My current idea is shown below:

a) After running the regular dynamics, assign some dummy pcoord as a placeholder

b) Calculate the actual pccord in batch (for all trajectories within an iteration using a separate Python script which will be called in post_iter.sh)

c) Update the pcoord (in h5 or $WEST_PCOORD_RETURN?) with real values

d) Assign bins for the next iteration based on the updated pcoord values

Would something like this be possible to implement using the current WESTPA workflow?

Thank you in advance and please let me know if I can provide more detail.

Best,

Harry

Jeremy Leung

unread,

Aug 27, 2025, 11:28:36 AM (10 days ago) Aug 27

to westpa-users

Hi Harry,

Yes. This is doable. But often times, this will be slower because you're doing this in one process (the "main" westpa process), instead of parallelizing this across all your workers. We see this slowdown very prominently if you're working on the HDF5 framework with a huge system, for example. If you do get a speedup, the most optimal way is to offload a huge part to the workers (scraping the coordinates etc.) and have the main process do the bare minimal.

1. First thing is you need to do is to make sure you assign bins to segments all at once at the end of an iteration (see the MAB code)

2. Probably do the calculations in the bin mapper? (See older mab code on how to pull things out. Having external files for sim manager/driver would be the easiest to debug and what not)

3. You can pull the data manager to access the per-iter information, since you need to write (see DL-enhancedWE code). `import westpa; westpa.rc.sim_manager.data_manager ...`

The following code should be useful reference.

More complicated example:

https://github.com/westpa/DL-enhancedWE

Easier example to copy (MAB):

https://github.com/westpa/westpa/tree/westpa2/src/westpa/core/binning

https://github.com/westpa/user_submitted_scripts/tree/main/Adaptive_Binning/adaptive_2.0

Example of me trying to speed up the hdf5 framework, which requires offloading a majority of the coordinate extraction to each worker. I think the balance here is worth looking at:

https://github.com/westpa/westpa/pull/484

-- JL

John Russo

unread,

Aug 27, 2025, 9:35:40 PM (9 days ago) Aug 27

to westpa-users

Another option is to do your pcoord calculation separately and in parallel, but independently of the workers. What I mean by this is having the end of `runseg.sh` submit a pcoord calculation task. This calculation is done by a separate pool of workers, and the result is given to the main thread.

It sounds to me like you have two (related) needs:

- A pcoord calculation can scale to more than a single worker's resources

- Some pcoord calculations need to be done using the results of all the workers, and can't be calculated using an individual trajectory. Similarly, you want to calculate binning using all of the results of the pcoord calculations.

The difference is if you do each calculation within runseg as you do now, calculations are effectively distributed across the workers, but each calculation only uses a single worker's resources. On the other hand, if runseg splits off a new separate task for the calculation, each calculation can use the entire resources of whatever pool it's submitted to.

For the second point: Since all the calculations are done by one thing, it can also wait and collect trajectories from multiple workers, and do calculations over them in aggregate.

This also lets you configure and provision the nodes for pcoord calculation separately from the WESTPA nodes. For example, if your pcoord calculation isn't GPU-optimized, you can run it on nodes with more CPUs instead of running it on the GPU worker node.

-John

Leung, Jeremy

unread,

Aug 28, 2025, 12:54:13 PM (9 days ago) Aug 28

to westpa...@googlegroups.com

Some great points from John (nice to hear from you!).

One last thing I forgot to mention: if you want to do the analysis/calculations during dynamics propagation then you can try out the streaming support PR w/ MDAnalysis. NAMD/GROMACS/LAMMPS supports IMDv3 protocol that allows you to stream out the coordinates while the simulation is running. Obviously still WIP (both on the IMDClient and the WESTPA implementation), but another option there (https://imdclient.readthedocs.io/en/latest/).

Added new class with methods to handle trajectory streaming by jpkrowe · Pull Request #501 · westpa/westpa

github.com

-- JL

---
Jeremy M. G. Leung, PhD
Postdoctoral Associate, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

On Aug 27, 2025, at 9:35 PM, John Russo <jdr...@gmail.com> wrote:

You don't often get email from jdr...@gmail.com. Learn why this is important

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/westpa-users/fc517b19-f93c-4dd0-9205-3dae567f1aa4n%40googlegroups.com.

Daniel Zuckerman

unread,

Aug 28, 2025, 2:30:41 PM (9 days ago) Aug 28

to westpa...@googlegroups.com

Thanks everyone for these suggestions (which are all way over my head!). --Dan

From: 'Leung, Jeremy' via westpa-users <westpa...@googlegroups.com>
Sent: Thursday, August 28, 2025 9:54 AM
To: westpa...@googlegroups.com <westpa...@googlegroups.com>
Subject: [EXTERNAL] Re: [westpa-users] Reassigning $WEST_PCOORD_RETURN outside runseg.sh

To view this discussion visit https://groups.google.com/d/msgid/westpa-users/92B6C239-DA63-46D9-824C-1107B1CB52F9%40pitt.edu.

Harry Ryu

unread,

Aug 28, 2025, 2:45:14 PM (9 days ago) Aug 28

to westpa-users

Hi John, glad to hear from you as well and hope you're doing well in SD!

Thank you for the suggestions. I'll have to think about the different ideas from this thread and see how they can be implemented.

Best,

Harry

2025년 8월 28일 목요일 오후 1시 30분 41초 UTC-5에 Daniel Zuckerman님이 작성:

Hayden Scheiber

unread,

Aug 28, 2025, 4:53:24 PM (8 days ago) Aug 28

to westpa-users

Hi Harry,

I've implementing something similar to this in my research on weighted ensemble simulations. I think your most straightforward and flexible option is what Jeremy already said: to perform your pcoord calculation directly in a custom Bin Mapper. This will ensure you are using your batched pcoords to assign bins as you see fit, and you can build your pcoords and assign them to bins entirely in memory without saving to file. The downside is that you will have to define your binning strategy explicitly within the custom bin mapper, you won't be able to use the included WESTPA strategies.

The bin mapping is performed from the main WESTPA thread at the end of each iteration, after all walkers are finished their dynamics. At this stage you can be assured all necessary simulation data will be available, and you can assign whatever computational resources are available on the node to facilitate your pcoord calculation in a single batch, without worrying about it being used by active workers. Further, there is nothing stopping you from overwriting the saved dummy pcoords in the west.h5 file at this stage.

In my use-case I am calculating meaningful "raw" pcoords on-the-fly in runseg.sh, and then learning a transformation on the pcoords during the binning stage before binning them.

In your case it sounds like you'd be pulling trajectory data directly from the raw outputs (ideally saved within per-iteration h5 files for fast simplified I/O) to compute pcoords all at once in a batch.