@profile gives Segmentation fault in 0.4.0-rc3

187 views
Skip to first unread message

Deniz Yuret

unread,
Oct 3, 2015, 3:07:47 PM10/3/15
to julia-dev
I am testing a large package written on top of JuliaGPU, and I get frequent segfaults with the profiler.  Unfortunately there is no information or error message to tell me what is causing it.  I tried profilling each part of the offending program separately, but when I look for it the segfaults disappear.  I am stumped.  Any advice on how to debug this?

thanks,
deniz

Tim Holy

unread,
Oct 3, 2015, 3:25:28 PM10/3/15
to juli...@googlegroups.com
I haven't done this in ages myself, but when I last tried I also saw segfaults
when combining the profiler and CUDArt. My situation was complicated (yours
might be too), in that it involved multiple julia processes, timed waits, etc.
Because of that complexity, I did not make any progress in isolating the
problem.

--Tim

Yichao Yu

unread,
Oct 3, 2015, 3:41:02 PM10/3/15
to Julia Dev
Any self-contained (possibly using other registered packages) that
triggers the segfault?

Deniz Yuret

unread,
Oct 3, 2015, 8:16:25 PM10/3/15
to julia-dev
I could not generate a small example that consistently generates a segfault (yet).  While trying I noticed that the segfault always occurs during profiling but does not occur in the same place in the program.  That brings some bad interaction with gc to mind.  I don't know what else is non-deterministic in Julia.  I also ran julia-debug under gdb, but the segfaults appear all over the place: https://gist.github.com/denizyuret has the first 7 examples typing the exact same commands to a fresh julia session.

Of course these are probably not where the offending instruction is but when the OS finally notices something is off.  I remember using things like electric fence to trace these to the offending instruction long time ago.  I am not sure what the modern tools are or what works with Julia.  Let me know if I can provide anything else to help chase this bug.

thanks,
deniz

Jameson Nash

unread,
Oct 3, 2015, 8:26:01 PM10/3/15
to juli...@googlegroups.com
some aspects of the backtrace make me think you might not be using llvm-3.3. can you add the output of versioninfo()? libunwind won't work on newer versions of llvm currently due to https://github.com/JuliaLang/julia/issues/12060, https://github.com/JuliaLang/julia/pull/12380, and https://github.com/JuliaLang/julia/blob/f67f21398754724589cda779c0429ea9fda4b47d/src/codegen.cpp#L5967

Deniz Yuret

unread,
Oct 3, 2015, 8:44:52 PM10/3/15
to julia-dev
Here is the versioninfo:

julia> versioninfo()
Julia Version 0.4.0-rc3
Commit 483d548* (2015-09-27 20:34 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

This was the v0.4.0-rc3 image I downloaded from the download page.  It does not look like llvm is dynamically linked, is this version built into the binary?

deniz

Deniz Yuret

unread,
Oct 3, 2015, 9:05:51 PM10/3/15
to julia-dev
I tried valgrind and the output is at https://gist.github.com/denizyuret/d365d1215efac5e62348

The command I used was: valgrind --leak-check=yes julia-debug.

Profiling starts at line 119.  There seems to be some trouble at line 164, 248, etc.

Jameson Nash

unread,
Oct 3, 2015, 9:43:13 PM10/3/15
to julia-dev
there's only one valgrind report that looks bad to me, the rest are essentially nuisance message from libunwind (msync_validate):

==15636== Invalid read of size 8
==15636==    at 0x58DF33D: access_mem (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/libjulia-debug.so)
==15636==    by 0x58DD3FF: is_plt_entry (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/libjulia-debug.so)
==15636==    by 0x58DD59B: _ULx86_64_step (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/libjulia-debug.so)
==15636==    by 0x4FAF6CE: rec_backtrace_ctx (task.c:661)
==15636==    by 0x4FAF5F9: rec_backtrace (task.c:645)
==15636==    by 0x4FC7030: profile_bt (signals-linux.c:17)
==15636==    by 0x63F770F: ??? (in /lib64/libpthread-2.12.so)
==15636==    by 0xD70B14F: ??? (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/sys-debug.so)
==15636==    by 0x10FEFFE2BF: ???
==15636==    by 0xD77AC1F: jlcall_finalizer_2641 (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/sys-debug.so)
==15636==    by 0x4F005C1: jl_apply (julia.h:1324)
==15636==    by 0x4F067EB: jl_apply_generic (gf.c:1684)
==15636==  Address 0x10feffe2c0 is not stack'd, malloc'd or (recently) free'd

it looks like this may be https://github.com/JuliaLang/julia/pull/12380 after all.


Yichao Yu

unread,
Oct 3, 2015, 11:39:07 PM10/3/15
to Julia Dev
On Sat, Oct 3, 2015 at 8:16 PM, Deniz Yuret <deniz...@gmail.com> wrote:
> I could not generate a small example that consistently generates a segfault
> (yet). While trying I noticed that the segfault always occurs during
> profiling but does not occur in the same place in the program. That brings
> some bad interaction with gc to mind. I don't know what else is
> non-deterministic in Julia. I also ran julia-debug under gdb, but the
> segfaults appear all over the place: https://gist.github.com/denizyuret has
> the first 7 examples typing the exact same commands to a fresh julia
> session.

The output you posted doesn't seem to include the actually code that is run.

Deniz Yuret

unread,
Oct 4, 2015, 12:40:20 AM10/4/15
to Julia Dev
I could not find a nice small standalone example, but if you don't mind installing some stuff, here are the instructions:

Pkg.init()
Pkg.clone("git://github.com/denizyuret/Knet.jl.git")
Pkg.build("Knet")
include(Pkg.dir("Knet/examples/linreg.jl")
@time linreg()
@profile linreg()
Reply all
Reply to author
Forward
0 new messages