Help Finding Possible Memory Leak/Issue

10 views
Skip to first unread message

Andrew Emerick

unread,
Jun 2, 2020, 9:28:37 AM6/2/20
to enzo...@googlegroups.com
Hi all,

If anyone is up for a good challenge (and the reason why I got a bit behind on working on my Enzo-E PR's...):

I've been trying to track down a possible memory leak for the past week or so (in Enzo) but haven't really be able to make any solid progress on definitively diagnosing the issue. This is in large part because I can only reproduce the seg-fault (which is either a "free(): invalid pointer:" or "double free or corruption") on a large run (256^3 with 9 levels and with radiative transfer) and it 1) only occurs when running with optimization O1 or O2 (though a problem may still exist in O0), and 2) when running on > 2 processors.

I've spent a very long time using both valgrind and address sanitizer to try and get to the bottom of the issue on smaller problems, but with no luck. This is partly because I'm pretty unfamiliar with the routines that seem to be possibly causing the problem. I was wondering if anyone has experience with the below bits of code to say if there is indeed an issue or not:

1) I can get a core dump when the seg-fault occurs. Most of the core files contain no backtrace (just something like "<class 'gdb.MemoryError'> Cannot access memory at address 0x7ffe2ca17718:"), but usually one has a  backtrace that points to line 495 in Grid_DepositParticlePositions.C (`delete [] ParticleMassTemp;`) . Given this is a memory error this isn't necessarily the problem point -- just the point where whatever is going wrong pops up. it looks like this file has been only minimally changed since 2016. I tried initializing pointers to NULL and doing NULL checks before the delete but this does not change behavior.

2) Running valgrind locally on a separate problem, I get an explosion of weird memory errors stemming from CommunicationTransferPhotons.C that look something like this . Where it looks like memory gets corrupted all over the place giving "Conditional move or jump on uninitialized values" in several very unrelated routines (there is nothing special about the grid::ComputePressure in that example... it pops up in many different places). As you can see, Valgrind points to line 162 in CommunicationTransferPhotons.C, where a photon send list is allocated `SendList[proc] = new GroupPhotonList[nPhoton[proc]];` But again, not much has changed in this routine lately and I can't seem to see any issues in this code. So I'm not sure this is actually the problem point or not.

Its very likely I have something screwy in my fork that is the real source of the issue and entirely unrelated to the above, but I was hoping someone could help me definitively rule out the above two bits of code as the problem points. 

And I'd take any suggestions anyone may have. I've very nearly exhausted all of my normal strategies for tracking something like this down. Unfortunately I'm very limited in how far back in my code history I can go and re-run to track down the problem commit since the run requires fairly recent additions to work. I've already demonstrated the issue is still present in as far back in the history as I can go.



Best,
Andrew
---
Pasadena Fellow in Theoretical Astrophysics
Carnegie Observatories
California Institute of Technology

David Collins

unread,
Jun 2, 2020, 3:08:30 PM6/2/20
to enzo...@googlegroups.com
Hi, Andrew-

Have you used the overloaded new/delete that James wrote a while back?  
I recall it being straight forward, there' only a couple lines.  Then you can pepper the code with memory usage reports.

d.

--
You received this message because you are subscribed to the Google Groups "enzo-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enzo-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/enzo-dev/CAOo4WKJUzZ9JGdU4Msvz7oe-%3D_1%2BERMmSf0EYyfohc8vZ3OtnA%40mail.gmail.com.


--
-- Sent from a computer.

Andrew Emerick

unread,
Jun 2, 2020, 3:15:54 PM6/2/20
to enzo...@googlegroups.com
I have not. And was unaware of this. But I’d be very interested in trying it out. Where can I find how to turn it on / use it? 



Best 
Andrew 

--

John Wise

unread,
Jun 2, 2020, 3:59:20 PM6/2/20
to enzo...@googlegroups.com
Hi Andrew,

You can turn it on with the compile-time MEMORY_TRACE define. Then you
can put in calls to ReportMemoryUsage() where ever. The routines are in
ReportMemoryUsage.C and MemoryAllocationRoutines.C.

About the your memory leaks: I'm not sure what's going on in either of
your cases. I hope you can figure it out with these tracers.

Thanks,
John
> <https://pastebin.com/zVV53hRQ> that points to line 495 in
> Grid_DepositParticlePositions.C
> <https://github.com/enzo-project/enzo-dev/blob/master/src/enzo/Grid_DepositParticlePositions.C>
> (`delete [] ParticleMassTemp;`) . Given this is a memory error
> this isn't necessarily the problem point -- just the point where
> whatever is going wrong pops up. it looks like this file has
> been only minimally changed since 2016. I tried initializing
> pointers to NULL and doing NULL checks before the delete but
> this does not change behavior.
>
> 2) Running valgrind locally on a separate problem, I get an
> explosion of weird memory errors stemming from
> CommunicationTransferPhotons.C
> <https://github.com/enzo-project/enzo-dev/blob/master/src/enzo/CommunicationTransferPhotons.C>
> that look something like this <https://pastebin.com/WrDQUHFk> .
> <mailto:enzo-dev+u...@googlegroups.com>.
> <https://groups.google.com/d/msgid/enzo-dev/CAOo4WKJUzZ9JGdU4Msvz7oe-%3D_1%2BERMmSf0EYyfohc8vZ3OtnA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
>
>
>
> --
> -- Sent from a computer.
>
> --
> You received this message because you are subscribed to the Google
> Groups "enzo-dev" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to enzo-dev+u...@googlegroups.com
> <mailto:enzo-dev+u...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/enzo-dev/CALyg-aiHPC7ng9ugxOoxOBQ2%3Dq%3DmxyiXMpPH-L65MuL7E4bD5g%40mail.gmail.com
> <https://groups.google.com/d/msgid/enzo-dev/CALyg-aiHPC7ng9ugxOoxOBQ2%3Dq%3DmxyiXMpPH-L65MuL7E4bD5g%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> ---
> Pasadena Fellow in Theoretical Astrophysics
> Carnegie Observatories
> California Institute of Technology
>
> --
> You received this message because you are subscribed to the Google
> Groups "enzo-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to enzo-dev+u...@googlegroups.com
> <mailto:enzo-dev+u...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/enzo-dev/CAOo4WKKBi6gByxp-ERaeFSd-HTTZbiggJJP_Mvw1Rz4%3DE1tirA%40mail.gmail.com
> <https://groups.google.com/d/msgid/enzo-dev/CAOo4WKKBi6gByxp-ERaeFSd-HTTZbiggJJP_Mvw1Rz4%3DE1tirA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
John Wise
Associate Professor of Physics
Center for Relativistic Astrophysics, Georgia Tech
http://cosmo.gatech.edu

Andrew Emerick

unread,
Jun 3, 2020, 5:31:05 PM6/3/20
to enzo...@googlegroups.com
Hi John and Dave (and all),

Thanks again for the recommendation. I haven't had much success yet in using the tool, but am still working on it. In the mean time, I have noticed that I can push past the seg-fault (which it isn't obvious if this truly fixes the problem or just obscures it) if either:

1) I change LoadBalancing from 4 to 1  ( which either means the issue is somewhere in how the Hilbert Curve load balancing works... or this just skips over the particular communication that causes the seg fault but doesn't actually fix the real problem).

or

2) If I change ParticleSubgridDepositMode from 1 (its default) to 0 or 2....  Could be the same story here.

But since I'm unfamilar with this, just wondering again if I'm potentially on the right track / if anyone has had issues here before.


Best,
Andrew
---
Pasadena Fellow in Theoretical Astrophysics
Carnegie Observatories
California Institute of Technology

To unsubscribe from this group and stop receiving emails from it, send an email to enzo-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/enzo-dev/5d23e524-d21d-e819-dd41-31015624ab08%40physics.gatech.edu.
Reply all
Reply to author
Forward
0 new messages