Traj reading goes silently rogue for big systems

47 views
Skip to first unread message

Ashlin James Poruthoor

unread,
Jan 16, 2024, 8:21:29 AMJan 16
to westpa-users
Hi,

I have been using westpa2 to run a range of lipid bilayers with different system sizes. I encountered a strange problem for bigger systems while using w_crawl. The iter_XXXXXX.h5 files don't have the usual structure, and w_crawl fails to detect some h5 attributes like pointers, etc. I traced this back, and I think that since I had a bigger system, the pdb files I was generating had an overflow in the residue number (more than five digits), resulting in a warning like this :


-- WARNING [westpa.core.propagators.executable] -- could not read any data for trajectory: invalid literal for int() with base 16: 'A000G' Thankfully, I backed up trajs and pdbs to crawl manually through the data set. But I just wanted to point out that this might be a disaster otherwise for me if I haven't done it and if people want to use the iter_XXXXXX.h5 file for other purposes post WE. Interestingly, WE ran smoothly, as if nothing happened except the job log warnings. I want to point out that I'm not using the iter_XXXXXX.h5 files to restart the sims (which I believe might have caught this early on, but I do store them in iter_XXXXXX.h5 files using $WEST_TRAJECTORY_RETURN, and it works perfectly for smaller systems). Instead, I restart from the dcds and pdbs. I think having a sanity check might be good if trajectory info couldn't be parsed/appropriately handled in a segment and passed toiter_XXXXX.h5 files as expected. I wonder if there's a way to get around this common pdb issue for bigger systems inside westpa2 or if efforts towards handling this properly would be a good use of the GitHub issue. Thank you, Ashlin

Lillian Chong

unread,
Jan 16, 2024, 9:44:18 AMJan 16
to westpa...@googlegroups.com
Hi Ashlin,
Thanks for sharing this issue. Please go ahead and post this as a GitHub issue and someone will take a closer look.
Best,
Lillian

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/5f340de1-7e4e-4cc4-9a7f-9211b35e8f95n%40googlegroups.com.

Jeremy Leung

unread,
Jan 16, 2024, 1:29:44 PMJan 16
to westpa-users
Hi Ashlin,

Thanks for posting this and bringing this to our attention. This is a more general file issue with PDBs unable to save large systems without workarounds and the error stems from an upstream issue with `mdtraj` unable to parse the HYBRID_36 numbering convention from pdbs.  (See these three PR/Issues)

There are a few fixes I can think of:
1) Use something other than a pdb file as your topology (which is probably better for a large system anyways as PDB's fixed-width columns could affect other things too, like residue numbers and/or atom numbers)
2) Use the pdb, but don't use the HYBRID_36 convention (I think the other (VMD) way is to change to HEX after 99999, but I don't know if it properly works with mdtraj either)
3) Wait for mdtraj fix (I'm going to poke upstream to see if we can get some fix in)

As for the job_logs warnings:
The warnings do not crash the whole WE simulation so you can just use parts of the HDF5 Framework like you currently do (just $WEST_TRAJECTORY_RETURN or just $WEST_RESTART_RETURN, or not saving logs at all). Personally I'm open to changing this behavior but this is highly technical on prioritizing error modes so should be discussed in the (eventual) GH Issues thread.

Best,

Jeremy L.

Ashlin James Poruthoor

unread,
Jan 16, 2024, 3:36:58 PMJan 16
to westpa-users
Hi Lillian and Jeremy,

Thank you for your input. I will post this as a GH issue and discuss it over there!

Ashlin
Reply all
Reply to author
Forward
0 new messages