Incorrect Binning Behavior of MABBinMapper Scheme

33 views
Skip to first unread message

baoyan liu

unread,
Mar 31, 2025, 10:02:02 PMMar 31
to westpa-users
Hi,

I am currently using WESTPA to sample the conformational change of a protein involved in a protein-DNA binding project. Because the system is quite large, with 916 residues, 100 DNA bases, and a ligand, I want to position bins with values decreasing along the progress coordinate. To achieve this, I set the direction to [-1, -1] within the MABBinMapper scheme block, as I have spaced the bins in two dimensions: dimension 0 for distance change (initial value 54.5 Å), and dimension 1 for RMSD change (initial value 1.0). However, when I plot the histogram evolution, the bins still split in both directions, which indicates that the direction setting [-1, -1] is not working. Can you help me with this?
Here's my configuration:
west:
  system:
    module_path: $WEST_SIM_ROOT
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 2
      pcoord_ndim: 2
      bin_target_counts: 8
      bins:    
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - ['-inf', 0, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 10, 20, 30, 40, 50.0, 60]
            - ['-inf', 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 5.0, 10]
        mappers:
        - type: MABBinMapper
          nbins: [15, 15]
          direction: [-1, -1]
          bottleneck: true
          at: [3.0, 1.0]   # Near target state
  propagation:
    max_total_iterations: 100
    max_run_wallclock:    47:30:00
    propagator:           executable
    gen_istates:          false

Here is the plot:
20250401095739.png

Thanks a lot for your help,
Baoyan

Jeremy Leung

unread,
Apr 1, 2025, 12:02:09 PMApr 1
to westpa-users
Hi Baoyan,

MAB is working fine. It's a problem with your binning configuration. Your bins are too big near your starting states (away from your MAB bins) so you're essentially sampling in a "brute force" manner because occupied bins occupancy is 1 (or close to 1).  Let me clear up two misconceptions.

1. Misconfigured bins. See picture below, where omitted most bin boundaries except ones that are relevant. You placed your MAB bins (225 of them + bottleneck + leading) in a bin between 3 <= pcoord_0  (distance) < 3.5 and 1 <= pcoord_1 (RMSD) < 1.5 (note: you specified  `at: [3.0,1.0]`, which is rounded to the right (larger value) of the boundary). That is all the way to the left, not where your trajectories are (starting all the way to the right!). We consider this sampling in a "brute force" manner because there is no resampling (due to no bin-to-bin transition) in the distance (pcoord_0) dimension.

Screenshot 2025-04-01 at 11.44.57 AM.png
On a side note, I suggest adding in 'inf' at the end of both pcoord dimensions, just so your simulation won't crash on you when you get a trajectory with distance >= 60 Å (or maybe this is intentional, I don't know).

2. MAB Direction. While it's good you noticed a problem with the binning, the fact that "the bins still split in both directions" is not actually a problem. Firstly, the bins seen in the histogram are not the same bins used in resampling (`plothist` default is the range between min and max pcoord values divided into 100 bins). Based on your config, all of the segments are from the same bin between [50,60]! Secondly, using "direction" in MAB does not guarantee trajectories will only sample one direction. What it does is it will only place bottleneck bins and leading bins in that direction, so we can sample more towards that direction (i.e. guarantee more splitting, therefore more replicate trajectory segments). It is normal to see trajectories evolve in opposite direction(s) if the fundamental dynamics of the system dictates it so.

While I'm not sure about the fine details of your system, since your initial value for pcoord_1 (RMSD) is already ~1 Å and I'm assuming your target state still has a low RMSD, I would actually want to use '0' (both directions) for that dimension because I would expect the RMSD to actually increase a little bit then decrease, especially since your distance is ~50Å away! Using a `-1` would not benefit anything, considering the value is already in the ballpark there. Might as well not bias it at all.

Hopefully this helps.

Best,

Jeremy L.
---
Jeremy M. G. Leung, PhD
Postdoctoral Associate, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

baoyan liu

unread,
Apr 2, 2025, 9:05:50 AMApr 2
to westpa...@googlegroups.com
Dear Dr. Jeremy,

Thank you very much for your reply, for clarifying my misunderstanding, and for the illustrative picture.

Regarding your side note, yes, I should add 'inf' to the pcoord dimensions, particularly at the right boundaries; otherwise, It will keep crashing.

Based on your reply and your previous conversation with Hayden, I'm considering the following scheme. Could you please help to cofirm its reasonableness? Also, since you provided an illustrative picture earlier, could you please provide another diagram to clarify the leading bin and bottleneck locations? I understand a bottleneck represents a longest distance between two bins, indicating a significant barrier. However, I'm unclear about the definition of 'leading bins.' Are they bins near the target position, or those that initiate the next round of WESTPA simulations? I hope this isn't too much trouble. I've included my binning scheme below for your reference. Thank you in advance!

This is the binning scheme I intend to use:
      bins:
        # To use MAB binning, define a recursive bin mapper
        type: RecursiveBinMapper
        base:
            # This defines outer binning scheme
            type: RectilinearBinMapper
            boundaries:
                - [-inf, 0, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 10, 20, 30, 40, 50.0, 60 22.17, inf] # Pcoord 0: distance
                - [-inf, 0, 1.0, 3.0, inf] # Pcoord 1: rmsd
        mappers:
          # Here we define the MAB binning scheme
          - type: MABBinMapper
            direction: [-1, 0]
            skip: [0, 0]
            bottleneck: True
            nbins: [10, 1]
            at: [54.0, 1.0]
            mab_log: true
            bin_log: true
          - type: MABBinMapper
            direction: [-1, 0]
            skip: [0, 0]
            bottleneck: True
            nbins: [20, 1]
            at: [44.0, 1.0]
            mab_log: true
            bin_log: true
          - type: MABBinMapper
            direction: [-1, 0]
            skip: [0, 0]
            bottleneck: True
            nbins: [20, 0]
            at: [24.0, 1.0]
            mab_log: true
            bin_log: true
 - type: MABBinMapper
            direction: [0, 0]
            skip: [0, 0]
            bottleneck: True
            nbins: [1, 3]
            at: [3.0, 1.0]
            mab_log: true
            bin_log: true

This is my understanding of the binning scheme:
image.png

All the best,
Sincerely yours,
Baoyan

--
You received this message because you are subscribed to a topic in the Google Groups "westpa-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/westpa-users/Joaq1H6B0YE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to westpa-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/westpa-users/1121f75b-d714-45dd-b6e4-2d6605654634n%40googlegroups.com.

Jeremy Leung

unread,
Apr 2, 2025, 2:36:18 PMApr 2
to westpa-users
Hi Baoyan,

Your MAB bins will still be confined by your rectilinear binning, so your drawing is not exactly correct. There is also a typo in the first pcoord boundary and the indentation for the last MAB Bin section is wrong. This is a YAML file so spaces/indents matter.

If this your first time running this simulation (and/or running MAB), I would actually recommend just doing a big bin, instead of micro-managing using a multi-MAB scheme. This config is assuming anything above RMSD 3 doesn't matter too much. You can adjust the bin boundary in the base mapper if that assumption is wrong. With 8 segs per bin, you should get about (15 x 3) rectilinear bins + (8 x 1 leading bin) + (8 x 1 bottleneck bin) = 376 segments per iteration.

```
      bins:
        # To use MAB binning, define a recursive bin mapper
        type: RecursiveBinMapper
        base:
            # This defines outer binning scheme
            type: RectilinearBinMapper
            boundaries:
                - [0, inf] # Pcoord 0: distance
                - [0, 1.0, 3.0, inf] # Pcoord 1: rmsd

        mappers:
          # Here we define the MAB binning scheme
          - type: MABBinMapper
            direction: [-1, 0]
            skip: [0, 0]
            bottleneck: True
            nbins: [15, 3]
            at: [1, 2]
            mab_log: true
            bin_log: true
```

As for the diagram you requested... here's an example. Blue is the "leading" walker. Yellow is the bottleneck. red is the rectilinear binning, based on the positions of the leading and trailing walkers. You should read the MAB paper (https://pubs.acs.org/doi/10.1021/acs.jpca.0c10724) for better understanding of the "Z" function and how these bins are chosen. Pcoord is in parenthesis, the "weight" is located under each trajectory (dot).
Screenshot 2025-04-02 at 2.16.21 PM.png
Best,

Jeremy L.

---
Jeremy M. G. Leung, PhD
Postdoctoral Associate, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

baoyan liu

unread,
Apr 3, 2025, 12:11:55 PMApr 3
to westpa...@googlegroups.com
Hi Jeremy,
Thank you so much for your kind reply, the illustrative picture, and the recommended paper. I have a lot to dive into to get to know WESTPA better. I will read the paper carefully. Thanks again.

All the best,
Baoyan

Reply all
Reply to author
Forward
0 new messages