Using msm_we / haMSM restart plugin with equilibrium (non-recycling) WE simulations?

Isabel Thompson

unread,

Mar 31, 2026, 12:27:25 PMMar 31

to westpa-users

Hi WESTPA community,

I'm building an analysis pipeline for RNA folding WE simulations that hinges on approximating the slow modes of the dynamics and building haMSMs from simulations that are run in equilibrium mode (no recycling boundary conditions, no source/sink states) — walkers explore freely with splitting/merging to maintain coverage across bins.

For validation, I'd like to build a reference haMSM in progress coordinate space (traditional haMSM implementation by Copperman and Zuckerman) to compare against the haMSM I build in learned slow mode space. The idea is to check whether the learned slow mode coordinates actually capture longer timescales than the raw pcoord and to cross-validate between the two.

I have so far touched the surface for some of the literature but I have some overarching meta-questions about the approach:

Does `msm_we` work with equilibrium WE data? My understanding is that history labels are assigned based on which macrostate was last visited (determined from the MSM bin assignments and walker genealogy), not from recycling per se. But I'm not sure whether the `msm_we` code assumes recycling boundary conditions internally — e.g., whether it requires explicit source/sink state definitions determined by the basis and target states defined in `west.cfg` that wouldn't apply to my setup.

Can the WESTPA 2.0 haMSM restarting plugin be used in "analysis-only" mode (e.g., `n_restarts: 0`) to build an haMSM from multiple independent equilibrium WE runs without triggering the restart protocol? I'd like to leverage the multi-run aggregation and variance estimation but manage the simulations myself. I currently use WESTPA 2022.10 with the WEED driver and no re-weighting.

If `msm_we` doesn't support equilibrium WE, is there a recommended approach for building a pcoord-space haMSM from equilibrium WE data as a validation reference? I already have the machinery for weighted count matrices, per-iteration flux aggregation, and genealogy tracing via parent_id — so I could implement history label assignment myself, but I'd rather not reinvent the wheel if there's an existing tool.

Any guidance would be much appreciated. Even pointing to relevant literature that answers any of these questions would be amazing. Happy to share more details about the pipeline if helpful.

Thanks so much!

Isabel Thompson
PhD Candidate | The Corcelli Lab
Department of Chemistry & Biochemistry
University of Notre Dame
itho...@nd.edu

Daniel Zuckerman

unread,

Apr 3, 2026, 11:51:30 AMApr 3

to westpa-users

Hi Isabel. Your situation is an important one because, as I'm guessing, you wanted to ensure good exploration of configuration space via WESTPA without committing yourself to particular source and sink states.

In principle the msm_we and haMSM tools should be able to achieve what you want, but I don't have the nuts-and-bolts knowledge to answer first two questions right now. Jeremy from Lillian's group is on vacation this week, and I hope he can address those once he settles back to work.

On the other hand, we have been extremely focused on your general issue, framed as: I have a set of trajectory data, what's the best way to estimate steady state? And by 'steady state', I mean either equilibrium or a nonequilibrium SS based on arbitrary source-sink choices.

The newer solution we propose to this problem seems to be better than what's possible with MSM/haMSM approaches because it does not rely on specific discretizations of phase space. The analysis also lends itself to your question of whether a certain set of coordinates provides a better featurization than another.

The new approach, called RiteWeight, uses iterative solutions to (ha)MSMs with random clusterings in order to estimate steady-state weights for trajectory segments that are consistent with any clustering. You can see our main preprint on it:

https://arxiv.org/abs/2401.05597 (now accepted to PNAS)

The code works with (single-tau) WESTPA segments, and we will be placing some code on the github to facilitate extracting those.

A second preprint describes our in-progress work addressing noise issues with RiteWeight and which we plan to expand to include uncertainty estimation:

https://chemrxiv.org/doi/10.26434/chemrxiv.15001337/v1

Finally, some initial results applying RiteWeight to WE data can be found here

https://www.biorxiv.org/content/10.64898/2026.03.24.714034v1

Let me know if you have any further questions. --Dan Zuckerman

From: 'Isabel Thompson' via westpa-users <westpa...@googlegroups.com>
Sent: Tuesday, March 31, 2026 7:31 AM
To: westpa-users <westpa...@googlegroups.com>
Subject: [EXTERNAL] [westpa-users] Using msm_we / haMSM restart plugin with equilibrium (non-recycling) WE simulations?

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/westpa-users/7ecb6eb8-60cf-4c30-a589-db1936369f57n%40googlegroups.com.

Jeremy Leung

unread,

Apr 6, 2026, 12:21:54 PMApr 6

to westpa-users

Hi Isabel,

Welcome. Please consider everything that Dan mentioned. It's been a while (2-3 years) since I've seriously touched `msm_we` and the restarting plugin, but to answer your questions (if you end up using msm_we):

Does `msm_we` work with equilibrium WE data?

Yes it does. But you'd have to use a workaround where you set the source and sink states to a region that is never accessed (e.g. working with RMSD, which is never <0, you could make two extra bins in the negative value space and point the source and sink states to those).

Can the WESTPA 2.0 haMSM restarting plugin be used in "analysis-only" mode (e.g., `n_restarts: 0`) to build an haMSM from multiple independent equilibrium WE runs without triggering the restart protocol?

Yes. Setting `n_restarts: 0` will run the plugin in analysis-only mode. https://westpa.readthedocs.io/en/latest/documentation/ext/westpa.westext.hamsm_restarting.html#doing-only-post-analysis

If `msm_we` doesn't support equilibrium WE, is there a recommended approach for building a pcoord-space haMSM from equilibrium WE data as a validation reference?

It does, as I mentioned above. You can also import the `west.h5` directly into `msm_we` and work on the MSM building on your own. I would recommend installing msm_we from https://github.com/ZuckermanLab/msm_we since that has numpy2 (and recent python versions) support.

Best,

Jeremy L.

---

Jeremy M. G. Leung, PhD
Research Assistant Professor, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

Hayden Scheiber

unread,

Apr 11, 2026, 1:48:40 AMApr 11

to westpa-users

Hi Isabel,

I want to add that history augmented Markov State Models are constructions stemming from directed simulations. HaMSMs can't be derived from equilibrium WE simulations unless you have a progress coordinate explicitly tracking history, as otherwise the two directed ensembles would get hopelessly mixed by WE resampling.

HaMSMs derive their primary utility from the fact that their stationary solution (which for any MSM is independent of lag time) corresponds to a cycle that can be used to directly compute the rate of forward flux into the target at steady state.

On the other hand, the stationary solution of an equilibrium WE simulation MSM (still independent of lag time) corresponds to an equilibrium distribution, not a steady state of a cycle. To get a rate from an MSM built from an equilibrium MSM requires using the non-principal eigenvectors, which correspond to relaxation modes. These relaxation modes depend strongly on the choice of lag time.

All this to say it sounds like you are planning to build a normal MSM, and that is okay! Sounds like a cool project!