Benchmarks for amber topology parser

julien

unread,

Mar 4, 2015, 9:05:31 AM3/4/15

to sire-de...@googlegroups.com

Hi,

I am posting some benchmarks Hannes sent me on the time it takes to load up a Sire system from amber topology/crd files so we can have an archive of any discussion here.

\# 61153 121308 193177 465399
\C 5 20 47 210
\B 8 29 74 309
\A 0 2 6 154
\D 0 0 2 17
\N 2 6 15 252
\T 22 68 168 1006
\
\# = number of atoms
\C = setConnectivity
\B = setBonds
\A = setAngles
\D = setDihedrals
\N = setNonBondedPairs
\T = total including above terms and all other code
\times in seconds
\
\I can't really make sense of this because in the largest system the
\angles and non-bonded times suddenly go up. I also tried a system with
\1276000 atoms but after building 329000 molecules Sire terminated with
\a MemoryError. So the problem is not only runtimes but also memory
\foot print.

My first reaction is to check timings for the large system without water molecules and check if it still takes longer to setup bonds and angles than dihedrals. There should be less bonds, so it should be faster.

For the memory, Hannes, can you check how much memory you use at most during loading, and what memory Sire is using after the loading has completed ? This should help identify whether we are accumulating temporary variables, or if the issue is simply that it takes too much memory to store large systems as Sire objects.

Chris, I would be interested to know what you think might be slow in the various above SireIO::Amber subroutines. There is no obvious bottleneck so perhaps it is down to overheads of creating the various Sire objects needed to describe the system. Do you think it would make sense to consider a solution like OpenMP to spread the parameterisation of different molecules across cores ?

Best regards,

Julien

Christopher Woods

unread,

Mar 9, 2015, 5:17:39 AM3/9/15

to Sire Developers

Hi Julien,

Thanks for the benchmarks. From the data it looks like things are
being slowed down because Sire is using lots of memory and is causing
the machine to swap like crazy on the large system. Sire is very
memory hungry and was not really designed for hundreds of thousands of
atoms... You can get information about the memory allocation in Sire
using the code;

from Sire.Base import *
m = MemInfo()
m.startMonitoring(1000)

This will create a memory monitor that will print out the memory usage
every 1000 ms (1 second). This will run in the background, printing to
the screen, so you can watch Sire's memory consumption grow while you
are running a script. The two numbers are the total amount of memory
that has been given to Sire by the operating system, while the second
number is the actual amount of memory that Sire is using (if you
allocate 1 MB of memory in an array, but only place items in the first
100 KB of the array, then these two numbers would be 1 MB and 100 KB).

Another thing to realise when you are benchmarking Amber is that
reading in large molecules (e.g. proteins) is significantly more
costly than reading in small molecules (e.g. water). To get an idea of
scaling it would be better to benchmark the loading of boxes of
identical molecules of different sizes (e.g. 100 water molecules, 1000
water molecules, 10000 water molecules etc.). Then try this for
different sizes of molecule, e.g. (100 octane molecules, 1000 octane
molecules etc.). This will give you an idea of the scaling with
respect to number of molecules, so that you can see where the
bottleneck is located as the number of molecules increases.

In terms of loading in parallel, yes, you could try to parallelise the
loader. However, if you are memory limited, then a parallel loader
would be just as slow (if not slower) as the cost is the amount of
swapping going on because too much memory is being used.

Cheers,

Christopher

> --
> You received this message because you are subscribed to the Google Groups
> "Sire Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sire-develope...@googlegroups.com.
> To post to this group, send email to sire-de...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sire-developers.
> For more options, visit https://groups.google.com/d/optout.

--
---------------------------------------------------------
Christopher Woods
+44 (0) 7786 264562
http://chryswoods.com

julien

unread,

Mar 9, 2015, 7:31:07 AM3/9/15

to sire-de...@googlegroups.com

Hi Chris,

Thanks for the comments. I think the problem for Hannes is that his goal is to process quickly and cheaply a lot of input files, but he almost certainly doesn't need a full Sire setup since he doesn't do MC/MD simulations etc...Sire is mostly useful to deal with the bits that parameterise the ligands since the library contains code needed to find degrees of freedoms, and also to build dummy atom coordinates for the mapping code.

Hannes, you could check every instance where you use amber.readCrdTop() in FESetup as it stands ? The bottleneck should be when dealing with the solvated complex. What exactly does the code do with the solvated protein-ligand complex that would be hard to do with a simpler, more efficient, parser?

Best,

Julien

> email to sire-developers+unsubscribe@googlegroups.com.
> To post to this group, send email to sire-developers@googlegroups.com.

Reply all

Reply to author

Forward