Fwd: DA memory allocation

David Ketcheson

unread,

May 7, 2013, 9:16:25 AM5/7/13

to claw-dev

---------- Forwarded message ----------
From: Amal Alghamdi <amal....@kaust.edu.sa>
Date: Tue, May 7, 2013 at 2:42 PM
Subject: DA memory allocation
To: Lisandro Dalcin <dal...@gmail.com>
Cc: david ketcheson <david.k...@kaust.edu.sa>

Hi Lisandro

This is a further analysis of the memory issue in PetClaw that Grady noticed and discussed here:

https://groups.google.com/forum/?fromgroups=#!msg/claw-users/P88vjftjJks/IEEWWj3lKcwJ

I've ran a simple script to show memory usage by DA object. The script is attached.

The DA size is 100^3 with 3 degree of freedom.. It is expected for the corresponding data to take

about 24MB.. However, the DA object takes so much space! 192MB.

There is a previous discussion about space allocated by DA:

http://scicomp.stackexchange.com/questions/2484/memory-footprint-for-dmda-objects-in-petsc

but the memory allocation there is more reasonable that what I'm seeing here..

This is the output of the script I ran (rss is the resident set size in bytes, i.e the RAM

memory occupied by the process)

program beginning

meminfo(rss=21331968L, vms=2534883328L)

After creating DA object 100^3 with dof=3

meminfo(rss=213979136L, vms=2726912000L)

After creating the corresponding global Vec

meminfo(rss=214073344L, vms=2726912000L)

After creating the corresponding local Vec

meminfo(rss=238084096L, vms=2750914560L)

Memory

meminfo(rss=238084096L, vms=2750914560L)

After calling garbage collector

meminfo(rss=238084096L, vms=2750914560L)

(I've also used the package resource and the tools memory_profiler and got similar

results)

The things I'm not sure of, is why DA will allocate all this memory? and

is it freed at some point?

Allocating a 2d DA with the same size 1000*2

will consume almost the same amount of memory.

Attached the script I ran

-Amal

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

test2.py

Grady Lemoine

unread,

May 7, 2013, 9:46:04 AM5/7/13

to claw...@googlegroups.com

Wow, this seems to be a bigger issue than I thought. Thanks, David, for forwarding this, and thanks to Amal for investigating what's going on!

--Grady

--
You received this message because you are subscribed to the Google Groups "claw-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to claw-dev+u...@googlegroups.com.
To post to this group, send email to claw...@googlegroups.com.
Visit this group at http://groups.google.com/group/claw-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Aron Ahmadia

unread,

May 7, 2013, 10:17:41 AM5/7/13

to claw...@googlegroups.com

Measuring heap allocations (using the heap tool on OS X), I see:

72 MB allocated for a 100^3 DMDA

23 MB for the global vec

23 MB for the local vec

This is a total of 118 MB, of which slightly less than 40% contains dof data. All of this is consistent with the doubling of memory usage that Grady observed earlier, as well as the discussion thread on scicomp, the memory redundancy is coming from PETSc, not PetClaw.

I'd forgotten about this aspect of the DMDA, but I agree that it's a -1 for us, particularly when we're running on small clusters (the factor of 2 is almost lost when we start thinking about supercomputers).

Grady, if this is a show-stopper for you, we can come up with a way to do the communication using direct MPI calls instead of over PETSc's DMDA structure. We could also talk to the PETSc guys about providing a more lightweight distributed array object. What are your thoughts?

-A

Aron Ahmadia

unread,

May 7, 2013, 10:17:50 AM5/7/13

to claw...@googlegroups.com

And thanks to Amal for digging into this!

A

Grady Lemoine

unread,

May 7, 2013, 11:32:13 AM5/7/13

to claw...@googlegroups.com

For now, I think I'll be okay. I've been informed that I'm allowed to rent an Amazon EC2 instance if I need it; in the very short-term, for my thesis work, I can grab a high-memory EC2 cluster node with 244GB of RAM and 16 cores, on which I can run a decent-sized grid for the demo problem I was thinking of -- 300x300x600, or maybe 350x350x700. In the slightly longer term, one of my colleagues could probably get me time on a cluster node at her institution, which is either 128 GB/24 cores or 256 GB/48 cores, I'm not clear which. Either way, it should be enough to do something worthwhile with the current PetClaw.

Losing that factor of 4 (not 2) memory penalty would be nice, though, if and when possible. I could run grids that were 50% bigger in all dimensions. (Though I guess maybe it'd only be possible to recover a factor of 2? I don't quite get why both a local and a global vec exist; are they both copies of the same data?) I don't see running my code on supercomputers, or even small clusters, in my future, so I'll probably continue to care about memory usage indefinitely.

--Grady

Reply all

Reply to author

Forward