update to HDF5 causing segfaults?

Visto 170 veces
Saltar al primer mensaje no leído

ben

no leída,
7 mar 2014, 13:43:087/3/14
a julia...@googlegroups.com
Hi everyone,

A couple of days ago, a module I have been working on for some time started throwing segmentation faults frequently and at random. After a lot of head-scratching, I found out that the HDF5 package was causing this.

My module starts by loading (with HDF5) some data that I processed a month or so ago and stored on my hard drive in native Julia format (with HDF5). The files did not change but the package was updated.

I am not saying that this is a bug in the HDF5 package: maybe the package is not fully backward-compatible with old data, maybe I just need to re-process and re-store the data. I don't have the time to investigate right now so I am posting this hoping that it will save someone redundant head-scratching :)

Ben

Patrick O'Leary

no leída,
7 mar 2014, 13:47:027/3/14
a julia...@googlegroups.com
The HDF5 file format is stable, so I think it's fair to say there's a behavioral problem of some kind, either a bug or some version mismatching shenanigans.

Could you please post the output of versioninfo(true)?

ben

no leída,
7 mar 2014, 14:05:567/3/14
a julia...@googlegroups.com
It first appeared on the Ubuntu nightlies PPA:

julia> versioninfo(true)
Julia Version 0.3.0-prerelease
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  WORD_SIZE: 64
           Ubuntu 13.10
  uname: Linux 3.11.0-18-generic #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014 x86_64 x86_64
Memory: 7.708961486816406 GB (1456.62109375 MB free)
Uptime: 13777.0 sec
Load Avg:  0.29345703125  0.31298828125  0.375
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz: 
       speed         user         nice          sys         idle          irq
#1   800 MHz      87816 s        595 s      16024 s    1263335 s          0 s
#2   800 MHz      85845 s          5 s      15028 s    1269540 s          0 s
#3   800 MHz      83522 s         21 s      22295 s    1263617 s          0 s
#4   800 MHz      81962 s          2 s      16941 s    1269705 s          0 s
#5  3401 MHz      72782 s         11 s      13621 s    1286187 s          0 s
#6   800 MHz      83854 s         16 s      10064 s    1276365 s          0 s
#7   800 MHz      80380 s        408 s       9649 s    1278535 s          0 s
#8   800 MHz      86728 s         31 s      16384 s    1257841 s          0 s

  BLAS: libblas.so.3
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
Environment:
  TERM = xterm
  XDG_SESSION_PATH = /org/freedesktop/DisplayManager/Session0
  XDG_SEAT_PATH = /org/freedesktop/DisplayManager/Seat0
  DEFAULTS_PATH = /usr/share/gconf/ubuntu.default.path
  PATH = /home/ben/bin:/home/ben/.cabal/bin:/home/ben/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
  MANDATORY_PATH = /usr/share/gconf/ubuntu.mandatory.path
  NODE_PATH = /usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript
  HOME = /home/ben
  MATHEMATICA_HOME = /usr/local/Wolfram/Mathematica/9.0
  COMPIZ_BIN_PATH = /usr/bin/

Package Directory: /home/ben/.julia
9 required packages:
 - Cairo                         0.2.12
 - DataArrays                    0.1.3
 - Distributions                 0.4.0
 - HDF5                          0.2.19
 - IJulia                        0.1.4
 - PDMats                        0.1.0
 - ProfileView                   0.0.1
 - PyPlot                        1.2.2
 - Vega                          0.0.0
37 additional packages:
 - ArrayViews                    0.4.1
 - BinDeps                       0.2.12
 - Blocks                        0.0.2
 - Calculus                      0.1.3
 - Cartesian                     0.1.4
 - Codecs                        0.1.0
 - Color                         0.2.8
 - Compose                       0.1.26
 - DataFrames                    0.5.3
 - Datetime                      0.1.2
 - Distance                      0.3.1
 - DualNumbers                   0.0.1
 - GZip                          0.2.12
 - Gadfly                        0.2.5+             master
 - Hexagons                      0.0.1
 - ImageView                     0.0.15
 - Images                        0.2.30
 - IniFile                       0.2.2
 - Iterators                     0.1.2
 - JSON                          0.3.3
 - Loess                         0.0.2
 - NLsolve                       0.1.2              master
 - Nettle                        0.1.3
 - NumericExtensions             0.5.4
 - Optim                         0.2.0
 - Options                       0.2.2
 - PyCall                        0.4.2
 - REPLCompletions               0.0.0
 - SIUnits                       0.0.1
 - SortingAlgorithms             0.0.1
 - StatsBase                     0.3.7
 - TexExtensions                 0.0.1
 - Tk                            0.2.11
 - URIParser                     0.0.1
 - Winston                       0.9.0
 - ZMQ                           0.1.9
 - Zlib                          0.1.5

Then I built from source and the issue was exactly the same

julia> versioninfo(true)
Julia Version 0.3.0-prerelease+1905
Commit 2fb42e5* (2014-03-07 14:44 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  WORD_SIZE: 64
           Ubuntu 13.10
  uname: Linux 3.11.0-18-generic #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014 x86_64 x86_64
Memory: 7.708961486816406 GB (1364.75 MB free)
Uptime: 13881.0 sec
Load Avg:  0.59375  0.39453125  0.3984375
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz: 
       speed         user         nice          sys         idle          irq
#1  2100 MHz      88280 s        595 s      16087 s    1273050 s          0 s
#2   800 MHz      86093 s          5 s      15090 s    1279515 s          0 s
#3   800 MHz      83910 s         21 s      22374 s    1273428 s          0 s
#4   800 MHz      82249 s          2 s      17006 s    1279623 s          0 s
#5   800 MHz      72982 s         11 s      13674 s    1296253 s          0 s
#6   800 MHz      84039 s         16 s      10104 s    1286477 s          0 s
#7   800 MHz      80560 s        408 s       9679 s    1288609 s          0 s
#8  3401 MHz      87056 s         31 s      16452 s    1267612 s          0 s

  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm
Environment:
  TERM = xterm
  XDG_SESSION_PATH = /org/freedesktop/DisplayManager/Session0
  XDG_SEAT_PATH = /org/freedesktop/DisplayManager/Seat0
  DEFAULTS_PATH = /usr/share/gconf/ubuntu.default.path
  PATH = /home/ben/bin:/home/ben/.cabal/bin:/home/ben/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
  MANDATORY_PATH = /usr/share/gconf/ubuntu.mandatory.path
  NODE_PATH = /usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript
  HOME = /home/ben
  MATHEMATICA_HOME = /usr/local/Wolfram/Mathematica/9.0
  COMPIZ_BIN_PATH = /usr/bin/
  PYTHONHOME = /usr:/usr

Package Directory: /home/ben/.julia

Tim Holy

no leída,
7 mar 2014, 16:49:567/3/14
a julia...@googlegroups.com
First I've heard of this problem, and it's hard to debug without more
information.

If the problem appeared in v0.2.18, you could just say "git checkout v0.2.17"
and see if that causes the segfaults to go away.

If you updated your Julia as well, of course the segfault could be there.

There are instructions on capturing backtraces in various places, for example
the FAQ and (the nicest): https://gist.github.com/staticfloat/6188418

--Tim

ben

no leída,
13 mar 2014, 7:34:0213/3/14
a julia...@googlegroups.com
Hi Tim, Patrick,

Thanks for your answers. I just realized that the mistake was totally mine. Basically I had changed the definition of the abstract type of the object in question. My apologies. I guess the possibility of storing the definition of the abstract type along with the object and/or more explicit error messages (it took me a while simply to understand that the segfault was caused by something related to HDF5) could be useful, although it is a pretty silly thing to do (changing the type and trying to load the object), so I am not looking for excuses!

At least now I know about debugging.md ;)

Ben

Tim Holy

no leída,
13 mar 2014, 9:48:3013/3/14
a julia...@googlegroups.com
IIRC it does actually store the definition of types in a "hidden" group inside
the file. But in general it's hard to make use of: what if the type is defined
inside a module that hasn't been loaded? So yes, HDF5/JLD are a little bit
fragile when it comes to types; unfortunately, I don't think there is a good
solution---it's basically a consequence of having modules (which are a good
thing!).

--Tim

Patrick O'Leary

no leída,
13 mar 2014, 9:54:3013/3/14
a julia...@googlegroups.com
On Thursday, March 13, 2014 8:48:30 AM UTC-5, Tim Holy wrote:
IIRC it does actually store the definition of types in a "hidden" group inside
the file. But in general it's hard to make use of: what if the type is defined
inside a module that hasn't been loaded? So yes, HDF5/JLD are a little bit
fragile when it comes to types; unfortunately, I don't think there is a good
solution---it's basically a consequence of having modules (which are a good
thing!).

The segfault is suboptimal though. Since you store the structure of the source type on serialization, can we check for the existence of an equivalently-defined type before deserializing? Though the fact that Ben said he changed the *abstract* supertype is confusing; why would that matter? Or do I misunderstand him?

Tim Holy

no leída,
13 mar 2014, 10:02:0313/3/14
a julia...@googlegroups.com
For me there is no segfault, and indeed there's a friendly error message. Not
sure who wrote it, but maybe it was me :-).

julia> type MyType
a::Int
end

julia> m = MyType(7)
MyType(7)

julia> using HDF5, JLD

julia> @save "/tmp/test.jld" m

julia>
tim@diva:~$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "help()" to list help topics
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.3.0-prerelease+1981 (2014-03-12 06:36 UTC)
_/ |\__'_|_|_|\__'_| | Commit 7e94cfb* (1 day old master)
|__/ | x86_64-linux-gnu

julia> using HDF5, JLD

julia> @load "/tmp/test.jld" m
ERROR: Type MyType is not recognized. As a fallback, you can load /m with
readsafely().
in read at /home/tim/.julia/v0.3/HDF5/src/jld.jl:247
in read at /home/tim/.julia/v0.3/HDF5/src/jld.jl:209
in anonymous at no file

Patrick O'Leary

no leída,
13 mar 2014, 10:16:3113/3/14
a julia...@googlegroups.com
On Thursday, March 13, 2014 9:02:03 AM UTC-5, Tim Holy wrote:
For me there is no segfault, and indeed there's a friendly error message. Not
sure who wrote it, but maybe it was me :-).

Haha, fair enough.

ben

no leída,
13 mar 2014, 13:48:2813/3/14
a julia...@googlegroups.com
I meant composite! Grr.

Patrick O'Leary

no leída,
13 mar 2014, 14:26:1113/3/14
a julia...@googlegroups.com
That makes rather more sense. Since Tim seems to get a friendly error message in a situation that we believe to be similar to yours, I think a more comprehensive set of steps to reproduce--if indeed you can--will be needed to resolve any issues.

ben

no leída,
13 mar 2014, 15:45:2513/3/14
a julia...@googlegroups.com
Ok, here you go.

1) as is

include("segf.jl")
tt=DDCM.segf();
tt.tpi[1][1,1]                   <------------------------------ gives me a serious error [1] or a segfault, depending on my computer's mood

2) in `segf.jl`, switch `include("new_types.jl")` and `include("old_types.jl")`

include("segf.jl")
tt=DDCM.segf();
tt.tpi[1][1,1]                   <------------------------------ gives 0.0


Sorry, I was unable to reproduce this starting from scratch so this is a stripped down version of my real-world situation which makes it harder to parse (let me know if something is unclear).


[1] ERROR: no method getindex(SYSTEM: show(lasterr) caused an error
ERROR: no method Enumerate{I}(
 in showerror at repl.jl:111
 in showerror at repl.jl:66
 in anonymous at client.jl:93
 in with_output_color at util.jl:444
 in display_error at client.jl:91SYSTEM: show(lasterr) caused an error
WARNING: it is likely that something important is broken, and Julia will not be able to continue normally
new_types.jl
old_types.jl
segf.jl
corruptdata.jld

Tim Holy

no leída,
13 mar 2014, 15:49:4113/3/14
a julia...@googlegroups.com
Yes, if you do have a reproducible way of triggering the segfault, please do
submit as an issue: https://github.com/timholy/HDF5.jl/issues/new

--Tim

ben

no leída,
13 mar 2014, 16:48:2513/3/14
a julia...@googlegroups.com
#81 

Hope this helps!

Ben
Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos