Hi Akshay,
Sorry it took us a while to answer this.
1)
The 'trajlabels' dataset labels the state the your segments in west.h5
has last visited. It has the same number of segments in every iteration
due to limitations with numpy arrays where dimension have to be of fixed
shape
(
https://numpy.org/doc/stable/user/basics.creation.html#arrays-creation).
Those non-existent trajectories are always labeled as not occupying in
any states (so if you have 15 states defined (0-14), they'll always be
labeled 15 in every frame).
2) The 'statelabels' dataset
indicate what state that frame is currently in, versus 'trajlabels'
which is the state it last visited. So if you have a trajectory that
went from State 0 --> unlabeled region --> State 1, `trajlabels`
will indicate "0,0,1"; whereas `statelabels` will look like "0,15,1",
assuming you have 15 states defined (0-14 being the labeled ones).
3)
Regarding passing to msm_we, theoretically yes, but I believe msm_we
require much finer state definitions for the microbins, which is
probably really tedious to define in a west.cfg. I would suggest using
those definitions as your "stratified" bins in msm_we and let msm_we
cluster it further. John Russo would probably be a better person to
answer this.
And about augmenting the 'trajlabels'
information, we suggest adding that as an auxiliary dataset. It should be
quite straightforward to write code in runseg.sh where it 1) reads all the
pcoord, 2) assign it to states and 3) pass that dataset to the auxdata. If
you want WESTPA to do it,
here's an example of assigning "color" to all trajectories using the bin mapper.
Best,
Jeremy L.