Managing large number of files in WESTPA simulations

28 views
Skip to first unread message

Anupam Anand Ojha

unread,
Mar 5, 2025, 1:10:46 PMMar 5
to westpa-users

Dear WESTPA users,


I am currently running a WESTPA simulation, and the number of generated files is becoming a significant issue. I am hitting the system limit of 5 million files (very fast in about the 1000th iteration), affecting my ability to continue the simulation.


To manage the growing number of files while keeping the simulation functional, I have identified that the seg.log files can be safely removed for now. Since deleting log files alone may not be enough to stay within the file limit, I am exploring other approaches and would appreciate any insights from the community:


Do anyone of the options exist for now - 


  1. Compressing or archiving trajectory segments (traj_segs) from earlier iterations
  2. Use tar or zip to group old files together
  3. WESTPA configurations allowing automatic file cleanup after a certain number of iterations
  4. Storing trajectory data in an HDF5 database to reduce the number of individual files
  5. Moving old iterations to a separate storage location
  6. Deleting any other file within traj_segs folder that wont affect the simulation

If anyone has experience dealing with a similar issue, I would greatly appreciate any suggestions or best practices for managing file growth while keeping WESTPA uninterrupted.

Leung, Jeremy

unread,
Mar 5, 2025, 1:31:43 PMMar 5
to westpa...@googlegroups.com
Hi Anupam,

You can tar up the trajectories using at the post-iter step, first specified in `west.cfg`, which calls `post_iter.sh` (https://github.com/westpa/westpa_tutorials/blob/main/tutorial7.1-basic-nacl/west.cfg#L58-L61) (see attached file, remove `.txt` extension, to be located in westpa_scripts, which tars up the second-to-last iteration). This script also contains lines to tar up your seg_logs. Or you could use the HDF5 Framework (See tutorial 7.5) to consolidate the files.

Deleting files in the traj_segs is generally ok, as long as you keep the restart file of the last iteration for the currently running iteration (some suggest leaving current iter - 2, your choice). Obviously this may affect your ability to do post-simulation analysis. You may also want to remove any intermediate files (like analysis outputs per segment, if they are already saved as auxdata). (example: https://github.com/westpa/westpa_tutorials/blob/main/tutorial7.1-basic-nacl/westpa_scripts/runseg.sh#L30)



In short, yes, all 6 options you proposed is doable (and have been done before). I would do any of those in post-iter.sh or do it manually afterwards.

Best,

Jeremy L.

---
Jeremy M. G. Leung, PhD
Postdoctoral Associate, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/westpa-users/8659371a-c396-4f54-93d6-af5301b98689n%40googlegroups.com.

post_iter.sh.txt

Anupam Anand Ojha

unread,
Mar 5, 2025, 1:53:52 PMMar 5
to westpa...@googlegroups.com
Thank you Jeremy for such a quick response. 



--
Best regards,

A. Anand Ojha
Flatiron Research Fellow
Center for Computational Biology & Center for Computational Mathematics
Flatiron Institute, New York
Reply all
Reply to author
Forward
0 new messages