Double free or corruption (out)

11,032 views
Skip to first unread message

Nils Gudat

unread,
Sep 26, 2015, 12:39:15 PM9/26/15
to julia-users
Does anyone have an idea what this could be? This occured in the middle of a minimization routine that has previously (with slightly different parameter values) converged successfully.
This is on 0.4.0 (one of the last commits before rc1) on Ubuntu 14.04.3


 
Auto Generated Inline Image 1
Auto Generated Inline Image 2

Tony Kelman

unread,
Sep 26, 2015, 12:57:47 PM9/26/15
to julia-users
Please provide code that can reproduce the problem.

Nils Gudat

unread,
Sep 26, 2015, 1:07:43 PM9/26/15
to julia-users
That's the problem I alluded to in my question: This happened in the middle of a very lengthy minimization problem, which had been running for a couple of hours. On a previous run, a very similar version of the code finished successfully after about 10 hours. I was hoping that someone could at least tell me what this error message is about, it seems to be Linux-related and I have no clue what's going on.

Bill Hart

unread,
Sep 26, 2015, 1:44:48 PM9/26/15
to julia-users
The malloc/free functions are the ones that allocate and free blocks of memory. They are provided by the system (e.g. Linux).

A double free or corruption likely means that free was called twice on the same block of memory, or that something was overwritten that shouldn't have been, e.g. an array overrun or something similar. 

This might have happened deep within Julia itself or in some C library that your code calls.

Just an absolute guess based on the output you posted, some finalizer is trying to call a free or cleanup function on some data from a C library, but is passing invalid pointers to the C library... or there is a bug in the C library itself.

I'm sorry I don't know anything about the "minimization" you are speaking of. I'm not a numerical person. And I don't recognise any of the libraries mention in your stack trace (other than libjulia.so).

But does this information help in any way?

Tracking such things down can be very difficult. If you make a pile of much smaller examples, can you get the same thing to happen repeatedly with similar code?

Bill.

Yichao Yu

unread,
Sep 26, 2015, 1:45:24 PM9/26/15
to Julia Users
The error message means that something corrupted the memory. The most
likely reason that causes this I've seen is incorrectly used ccall (or
other unsafe memory stores).
What packages are you using? Do you at least have a list of them that
uses ccall?

Nils Gudat

unread,
Sep 26, 2015, 2:37:08 PM9/26/15
to julia-users
The minimization itself is NLopt, the problem is to solve an economic model (which takes around 2 minutes to solve on 16 cores) and compare its output (a 100x4 Float64 Array) to some data moments. The model results depend on two parameters. The model itself is mostly minimization (via Optim) and numerical integration (using FastGaussQuadrature), and is parallelized via SharedArrays.

(Since you asked for a list of packages, I'm also using ApproXD for linear interpolation, and Distributions to draw from a bivariate Normal).

Yichao Yu

unread,
Sep 26, 2015, 4:53:56 PM9/26/15
to Julia Users
Looks like there's at least one segfault in NLopt (AppVeyor Nightly
Win32) and I can reproduce locally with aggressive GC. Will
investigate.

Yichao Yu

unread,
Sep 26, 2015, 7:44:26 PM9/26/15
to Julia Users
> Looks like there's at least one segfault in NLopt (AppVeyor Nightly
> Win32) and I can reproduce locally with aggressive GC. Will
> investigate.

Fixed in https://github.com/JuliaLang/julia/pull/13325
I have no idea if it is the same SegFault/corruption you are seeing or
on the AppVeyor though.....

Nils Gudat

unread,
Sep 29, 2015, 4:25:22 AM9/29/15
to julia-users
Thanks for that, I've updated my verson to the latest 0.5 master, but am now getting this segfault, which looks like it's still connected to garbage collection:


Auto Generated Inline Image 1

Yichao Yu

unread,
Sep 29, 2015, 8:07:17 AM9/29/15
to Julia Users
On Tue, Sep 29, 2015 at 4:25 AM, Nils Gudat <nils....@gmail.com> wrote:
Thanks for that, I've updated my verson to the latest 0.5 master, but am now getting this segfault, which looks like it's still connected to garbage collection:

Not necessarily. The GC is just one of the most vulnerable piece of code to memory corruption.

The backtrace itself is basically useless. Running the code with a few debug options may help debugging but it is not that easy to describe the way to debug this. (I'm planning to update the GC debug doc but haven't get to it yet....)

If the issue is reproducible enough (total time to reproduce < 1 week), it would be helpful to post your full code. If you don't want to make it public, please feel free to send private email.





Nils Gudat

unread,
Sep 29, 2015, 8:43:31 AM9/29/15
to julia-users
The code usually segfaults after 2-5 hours, and is available at http://github.com/nilshg/LearningModels, however I haven't written it up in a way that is easy to run (right now it depends on some data not included in the repo), so I'll have to restructure a bit before you can run it. I'll try to do so today if I find the time.

Nils Gudat

unread,
May 31, 2016, 6:52:07 AM5/31/16
to julia-users
Resurrecting this very old thread - after having been able to solve the model with no seg faults over the last couple of months, they have now returned and occur much faster (usually within 2 hours of running the code).
I have refactored the code a little so that it hopefully will be possible for others to run it. Cloning the entire repo at http://github.com/nilshg/LearningModels, it should run when altering the path in https://github.com/nilshg/LearningModels/blob/master/NHL/NHL_maximize.jl to whatever path it has been cloned to.

I'm running this code on a 16-core Ubuntu 14.04 machine with Julia 0.4.5 installed an all packages on the latest tagged versions.

Bill Hart

unread,
May 31, 2016, 7:25:33 AM5/31/16
to julia-users
We are also suddenly getting crashes with 2.4.5. when running our (Nemo) test suite. It says that some memory allocation is failing due to invalid next size. I suspect there is a bug that wasn't there until the last few days, since we were passing just fine on Travis. Though at this stage, I haven't checked whether we are still passing on Travis.

Bill.

Bill Hart

unread,
Jun 1, 2016, 11:00:50 AM6/1/16
to julia-users
I've checked that the problem we were having doesn't happen with Julia 0.4.5 on Travis. In fact, it also doesn't happen on another one of our systems with Julia 0.4.5, so at this stage we have no idea what the problem is. It may be totally unrelated to the problem you are having.

Bill.

Nils Gudat

unread,
Jun 2, 2016, 3:45:24 AM6/2/16
to julia-users
Fair enough. Does anyone have any clues as to how I would go about investigating this? As has been said before, the stacktraces aren't very helpful for segfaults, so how do I figure out what's going wrong here?

Andrew

unread,
Jun 2, 2016, 11:51:03 AM6/2/16
to julia-users
Have you tried running the code without using parallel? I have been getting similar errors in my economics code. It segfaults sometimes, though not always, after a seemingly random amount of time, sometimes an hour or so, sometimes less. However, I don't recall it having ever occurred in the times I've run it without parallel. I'm using SharedArrays like you. I've seen this occur on both 0.4.1 and 0.4.5.

The error isn't too serious for me because I periodically save the optimization state to disk, so I can just restart.

I also can't remember this ever occurring on my own (Linux) computer. It's happened on a (Linux) cluster with many cores.  

Nils Gudat

unread,
Jun 2, 2016, 1:52:43 PM6/2/16
to julia-users
Hm, interesting observation... I suppose the issue in my case is that the code as it is takes about 3-4 days to complete, so running it on 1 instead of 15 cores means I'm unlikely to ever get my PhD!
I will at least try to run a shorter version that might be solvable in a day or two without parallel.

Nils Gudat

unread,
Jun 12, 2016, 4:36:27 AM6/12/16
to julia-users
So it looks like I'm having the same issue - have been running the code without parallelization (defining my SharedArrays as regular ones), and it has now been going for about 3 days without any segfaults. Is this a known issue? If so, do we know whether there's a Julia version one can revert to in which SharedArrays work?
Reply all
Reply to author
Forward
0 new messages