Using MDAnalysis and PMDA for a large number of trajectories, each of size 5GB

47 views
Skip to first unread message

Amir Sadeghi

unread,
Apr 16, 2021, 9:26:40 AM4/16/21
to MDnalysis discussion
Hello MDAnalysis community,

I study a simple system (a large bead-on-spring polymer crowded by many LJ beads). I have two types of Lammps data and trajectory pairs.

1. Type one pairs are a large number of trajectories (~400), each of size 5GB and paired with its small ~100KB data file. How can I effectively use MDAnalysis to analyze them? I have two main for-loops, one over the trajectories and the other over 70000 frames within a trajectory. Within the trajectory-loop, I measure 10 different quantities in each snapshot.

2. The other type of pairs are changed trajectories. Here, each trajectory is chunked into 14 smaller trajectories of size 1.5GB, so one whole trajectory is around 17GB. How can I do the same analyze as above for these trajectories? Here there are ~400*14 files. 

I have access to both many GPU and CPU cores  on an HPC cluster.

Thank you in advance for your responses.
Amir

Tamas Hegedus

unread,
Apr 16, 2021, 1:35:42 PM4/16/21
to mdnalysis-...@googlegroups.com
Amir,

I suggest to write and present a serial code first with some benchmark. E.g. do the analysis only on 500 frames of 5 trajectories instead of 70,000. Try to figure out the bottleneck and then ask more specific questions.

However, I do not think that parallelization will help too much, since you can not hold the whole data in RAM and have to reload the data several times from disk. Disk IO will be the bottleneck. As a first thought, the best you can do, that you do everything from an SSD. Move your trajectories to an SSD, which is approx 10x faster than regular disks. Or if you have access to a huge amount of RAM, load all of your data into the memory...
!!! Important: if you do not use all of your beads in the analysis then you can load only data for those beads and you need less memory.

Bests,
Tamas
--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/76dfecd7-70b1-4993-8309-be3ad7c3a536n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages