Hi Hayden,
- I save my protein coordinates as trajectories to per iteration HDF5
files, while the restart files (also saved to per iteration HDF5 files)
contain full system data. Does the haMSM module play nicely with the
HDF5 framework? If so, how to choose the struct_filetype field for the
haMSM plugin in west.cfg when using the HDF5 framework?
I believe there were efforts to streamline the process but wasn't implemented. You might have to manually crawl/create your own 'coord' auxdata set, which the rest of the plugin will use to build the MSM. msm_we relies on the 'coord' aux dataset in the main h5 file. If that dataset exists, the plugin will skip saving for that iteration.
- The tutorial 7.7 example provides a restart_overrides.py file with two
functions, processCoordinates and reduceCoordinates, yet the westpa docs
only mention the need for processCoordinates. Are both functions
needed?
`reduceCoordinates` is used when trying to copy the (xyz) coordinates into the west.h5 as the 'coord' auxdata. Say you don't need the water coordinates, you can remove them using this function. processCoordinates are used when featurizing, which is during the MSM building process. Using reduceCoordinates is highly encouraged because loading unnecessary coordinates will drastically increase RAM requirements when building the MSM, which is already very high due to the large amounts of data needed.
- It is more efficient for me to compute all of the reduced
coordinates needed for haMSM during the progress coordinates calculation
stage (don't need to re-load coordinates into MDAnalysis/mdtraj) and
save these as auxiliary data sets. Is it possible to skip
the processCoordinates function and just pull reduced coordinates from
westpa aux datasets for haMSM analysis? How would this be done?
That is exactly how the whole process works. It reads the coordinates from the 'coord' auxdata set for featurization. You can include your already featurized datasets here. In processCoordinates, just return the coordinates as is.
- For the Restarting plugin folder structure:
It is still the case. It's because the restarting plugin has a specific file structure which is build upon $WEST_SIM_ROOT . If you have a lot of bstates, you can actually circumvent it by directly appending your path to bstates.txt auxref column. For example, if you do the following, $WEST_STRUCT_DATA_REF will now point to $WEST_SIM_ROOT/bstates/{01..100}, where you have your 100 bstate folders in $WEST_SIM_ROOT/bstates.
Say in west.cfg you have the following:
west:
data:
data_refs:
basis_state: $WEST_SIM_ROOT
My $WEST_SIM_ROOT/bstates.txt file (which you can put whever, just make sure you point to the right path when calling w_init):
b01 0.25 bstates/01
b02 0.25 bstates/02
b03 0.25 bstates/03
b04 0.25 bstates/04
...
Hope that helps!
-- JL