Saving snapshot fails!

327 views
Skip to first unread message

Hossein Hasanpour

unread,
Apr 13, 2016, 4:02:15 AM4/13/16
to Caffe Users
Hello all, saving snapshots fails in caffe with this error :
I0413 12:18:18.139292 22191 solver.cpp:466] Snapshotting to HDF5 file examples/cifar10_full_relu_bn_iter_5000.caffemodel.h5
HDF5
-DIAG: Error detected in HDF5 (1.8.15-patch1) thread 0:
 
#000: H5G.c line 314 in H5Gcreate2(): unable to create group
    major
: Symbol table
    minor
: Unable to initialize object
 
#001: H5Gint.c line 194 in H5G__create_named(): unable to create and link to group
    major
: Symbol table
    minor
: Unable to initialize object
 
#002: H5L.c line 1638 in H5L_link_object(): unable to create new link to object
    major
: Links
    minor
: Unable to initialize object
 
#003: H5L.c line 1882 in H5L_create_real(): can't insert link
    major
: Symbol table
    minor
: Unable to insert object
 
#004: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major
: Symbol table
    minor
: Object not found
 
#005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
    major
: Symbol table
    minor
: Callback failed
 
#006: H5L.c line 1674 in H5L_link_cb(): name already exists
    major
: Symbol table
    minor
: Object already exists
F0413
12:18:18.184839 22191 net.cpp:945] Check failed: layer_data_hid >= 0 (-1 vs. 0) Error saving weights to examples/cifar10_full_relu_bn_iter_5000.caffemodel.h5.
*** Check failure stack trace: ***
   
@     0x7fc0d3364daa  (unknown)
   
@     0x7fc0d3364ce4  (unknown)
   
@     0x7fc0d33646e6  (unknown)
   
@     0x7fc0d3367687  (unknown)
   
@     0x7fc0d3a5e256  caffe::Net<>::ToHDF5()
   
@     0x7fc0d3a7d96c  caffe::Solver<>::SnapshotToHDF5()
   
@     0x7fc0d3a7fc30  caffe::Solver<>::Snapshot()
   
@     0x7fc0d3a80a8c  caffe::Solver<>::Step()
   
@     0x7fc0d3a81239  caffe::Solver<>::Solve()
   
@           0x40818e  train()
   
@           0x405a0c  main
   
@     0x7fc0d2672ec5  (unknown)
   
@           0x406141  (unknown)
   
@              (nil)  (unknown)
Aborted (core dumped)

what is wrong ?

Jiaming Hong

unread,
Apr 17, 2016, 11:47:06 PM4/17/16
to Caffe Users
I met the same problem. Did you get it sloved?

Heng Xiong

unread,
May 27, 2016, 2:07:59 PM5/27/16
to Caffe Users
I'm getting the same error. Found on another post suggesting this may be caused by duplicate layer names. But my net only has two "data" layers sharing same name, one for each of the phases, like many of the examples demonstrated.

Hossein Hasanpour

unread,
May 27, 2016, 2:30:11 PM5/27/16
to Caffe Users
I dont exactly remember what I did to get rid of that problem. I moved to windows since then and changed both my model and solvers ever since, so not sure if that was caused because of the model (layer definitions) or a caffe bug at the time which may still be present. 
I remember though this happened to me when I changed my cnn architecture, added more layers and filters, and the training would go on for several thousands iteration and suddenly crash on saving the snapshot!

Hossein Hasanpour

unread,
Sep 6, 2016, 2:17:26 PM9/6/16
to Caffe Users
I remember, the cause was / in the layer name, https://github.com/BVLC/caffe/issues/4267 
I faced it again today!
Reply all
Reply to author
Forward
0 new messages