Recycling Trajectories - How are basis state assigned?

Hayden Scheiber

unread,

May 27, 2025, 1:09:26 PMMay 27

to westpa-users

Hello,

I have a bit of a technical question regarding basis state recycling when using recycling boundary conditions (i.e. steady state mode) without istate generation. Possibly a bug report.

I ran a production simulation of protein-protein binding with 960 basis states (1 initial segment per bstate). The basis states were randomly oriented pairs of proteins separated by a large distance. When examining my west.h5 data, I noticed something unexpected. I looked at the basis state ids that were selected during recycling events with the following code:

from westpa.analysis import Run
with Run(westh5) as run:
for iter_idx in range(51,201):
# Get the recycled segment and basis state ids
recycled_basis_ids = run.h5file[f"iterations/iter_{iter_idx+1:08d}/new_weights/index"]["initial_state_id"]
print(recycled_basis_ids)

and I noticed the following pattern:
[ 1 2 6 16 17 22 30 32 38 43 47 49 50 54 56 59 68 70 74] [ 1 2 6 16 17 22 30 32 38 43 47 49 50 54 56 59 68 70 74] [ 1 2 6 16 17 22 30 32 38 43 47 49 50 54 56 59 68 70] [ 1 2 6 16 17 22 30 32 38 43 47 49 50 54 56 59 68 70] [ 1 2 6 16 17 22 30 32 38 43 47 49 50] [ 1 2 6 16 17 22 30 32 38 43 47] [ 1 2 6 16 17 22 30 32 38] [ 1 2 6 16 17 22 30 32 38 43 47 49 50 54 56 59 68 70 74 75 78 82 90 95] [ 1 2 6 16 17 22 30 32 38 43 47 49 50 54 56 59 68]

...

I expected that basis state ids would be selected from the pool of available basis states at random, but it appears this is not the case. The bstates are sampled in the same order at every iteration, so bstate 1 is always selected for recycling. Where in WESTPA is the code that determines how bstate ids are selected during recycling (without istates)? I think it should be updated to randomize the selection of bstates at each iteration where recycling occurs, sampling randomly from their user-defined weights.

Cheers,

Hayden

Leung, Jeremy

unread,

May 30, 2025, 11:06:55 AMMay 30

to westpa...@googlegroups.com

Hi Hayden,

Looking at the code, I see that there there isn't randomness at selecting istates as recycled states (might've been in older python versions, but ever since dictionaries are insertion-ordered since py3.7, that's out the window) but there is when generating those istates (and adding into the list).

The next istate is pulled under insertion order:

```

initial_state = next(istateiter)

```

westpa/src/westpa/core/we_driver.py at westpa2 · westpa/westpa

github.com

Which is generated from either here:

```

self.avail_initial_states = {state.state_id: state for state in initial_states}

```

westpa/src/westpa/core/we_driver.py at westpa2 · westpa/westpa

github.com

or insertion order if additional ones are needed.

```

self.we_driver.avail_initial_states[initial_state.state_id] = initial_state

```

westpa/src/westpa/core/sim_manager.py at westpa2 · westpa/westpa

github.com

We do guarantee some randomness in the picking with the RNG, and the next istate selected for calculation/future is selected based on the bstates probabilities.

```

ibstate = np.digitize([self.rng.random()], self.next_iter_bstate_cprobs)

```

westpa/src/westpa/core/sim_manager.py at westpa2 · westpa/westpa

github.com

That dictionary (`self.avail_initial_states``) should be cleared every iteration, so technically it should spit out a different list, but I wonder if it's somehow going through the 960 bstates (in order) before any new ones?

I'm inclined to say there is a bug with generating that `self.avail_initial_states`, but without knowing how your bstate weights and segs/bin, which binning, which WESTPA version etc, it's hard to say for sure.

Best,

Jeremy L.

---
Jeremy M. G. Leung, PhD
Postdoctoral Associate, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/westpa-users/9e0d2d87-0b0c-4bd4-ac28-afeae76f8c7fn%40googlegroups.com.

Hayden Scheiber

unread,

May 30, 2025, 12:18:29 PMMay 30

to westpa-users

Hi Jeremy,

Thanks for your reply! To answer your questions: all 960 bstates are generated with equal (1/960) weight. I used 16 segs per bin, WESTPA version 2022.11 for this run (was from awhile ago), and the binning scheme was a custom binless mapper (can't get into details - paper in progress!). But the binning scheme I uses the stock `BinlessDriver` as the WE Driver with a slightly modified `MABSimManager`.

The modifications are just to remove the code where we "Assign this iteration's segments' initial points to bins and report on bin population" and remove the two lines of code where "Let the WE driver assign completed segments". Essentially lines 161-177 from mab_manager.py are removed as I cannot assign bins from just the initial points with my scheme.

What's odd is I tried to reproduce the bug in WESTPA version 2022.12 using randomly generated 1D pcoords using similar conditions: same binning scheme, same number of basis states, and same 16 segs per bin. I set it up where about 5% would get recycled at random per iter. In that test I found the basis states were indeed selected at random after recycling. Not sure what changed, but I think you might be right that ` self.avail_initial_states` was not getting cleared properly in my production simulation.

Cheers,

Hayden

Leung, Jeremy

unread,

May 30, 2025, 2:06:41 PMMay 30

to westpa...@googlegroups.com

Hi Hayden,

We swapped over from the global instance `np.random` in v2022.11 to an independent generator `np.random.Generator` in v2022.12 throughout WESTPA. So if you're seeding the global instance of `numpy.random.seed()` with something, some where every iteration, that would explain why your recycled list is always the same. That change was done explicitly to cover this kind of silent RNG bug.

Otherwise I don't see any glaring changes in v2022.12 (maybe bar numpy2 support, but I don't think that touched any of the drivers/simmanagers)and your custom bin mapper that might cause this change.

Best,

Jeremy L.

---
Jeremy M. G. Leung, PhD
Postdoctoral Associate, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

To view this discussion visit https://groups.google.com/d/msgid/westpa-users/be3b5354-d636-4331-96eb-fdc577849cben%40googlegroups.com.

Hayden Scheiber

unread,

May 30, 2025, 2:19:43 PMMay 30

to westpa-users

Thanks Jeremy, that change in v2022.12 likely resolved the issue. I don't see where any deterministic seed placement in my custom binning code, but it very well could be going on within some dependency. Good idea to have changed RNG behavior!

Still good to know that for anyone using WESTPA version < 2022.12, that this may bias your selection of bstates after recycling.