HDF5-io error for input data

205 views
Skip to first unread message

Max Gordon

unread,
Dec 30, 2015, 9:56:44 AM12/30/15
to Caffe Users
After adding two outcomes to my network the network crashes when trying to read dataset. Not sure why this suddenly happens, below follows a description of the error and setup. I don't think its related to the bug: https://github.com/BVLC/caffe/issues/1726 that addresses output data, but I'm currently not sure even where to start looking for the problem. I'm running the latest git-version on a Ubuntu 14.04 machine with two K40c cards.

Layer setup

I'm using HDF5-containers as I have 7 labels associated with each image. In the prototxt I use a simple slicer to separate the label-blob from the HDF5-file:

layer {
    name
: "Label_slicer"
    type
: "Slice"
    bottom
: "label"
    top
: "label_var1"
    top
: "label_var2"
    top
: "label_var3"
    top
: "label_var4"
    top
: "label_var5"
    top
: "label_var6"
    top
: "label_var7"
    slice_param
{
        slice_point
: 1
        slice_point
: 2
        slice_point
: 3
        slice_point
: 4
        slice_point
: 5
        slice_point
: 6
        axis
: 1
   
}
}


On top of each layer I have an accuracy and a softmax:

layer {
  name
: "accuracy_prev_fracture"
  type
: "Accuracy"
  bottom
: "fc8_l1"
  bottom
: "label_var1"
  top
: "accuracy_var1"
  include
{
    phase
: TEST
 
}
  accuracy_param
{
      ignore_label
: 0
 
}
}
layer
{
  name
: "loss_var1"
  type
: "SoftmaxWithLoss"
  bottom
: "fc8_l1"
  bottom
: "label_var1"
  top
: "loss_prev_fracture"
  loss_param
{
      ignore_label
: 0
 
}
}

Each top layer have their own hidden layer, i.e. the fc8. At the bottom I have two simple HDF5-layer:

layer {
  name
: "Wrists"
  type
: "HDF5Data"
  top
: "data"
  top
: "label"
  include
{
    phase
: TRAIN
 
}
  hdf5_data_param
{
    source
: "/media/max/Encrypted/Processed/hdf5/all_train.txt"
    batch_size
: 256
 
}
}
layer
{
  name
: "Wrists"
  type
: "HDF5Data"
  top
: "data"
  top
: "label"
  include
{
    phase
: TEST
 
}
  hdf5_data_param
{
    source
: "/media/max/Encrypted/Processed/hdf5/all_validation.txt"
    batch_size
: 50
 
}
}


The error

The rest is basically standard AlexNet setup. I've previously ran this setup with 5 output layers, after adding the additional outputs I get the cryptic HDF5-error:

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 140690729749056:
  #000: ../../../src/H5Dio.c line 182 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: ../../../src/H5Dio.c line 550 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: ../../../src/H5Dchunk.c line 1837 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: ../../../src/H5Dchunk.c line 2868 in H5D__chunk_lock(): data pipeline read failed
    major: Data filters
    minor: Filter operation failed
  #004: ../../../src/H5Z.c line 1175 in H5Z_pipeline(): filter returned failure during read
    major: Data filters
    minor: Read failed
  #005: ../../../src/H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed
    major: Data filters
    minor: Unable to initialize object
F1230 15:28:18.565651 23368 hdf5.cpp:72] Check failed: status >= 0 (-1 vs. 0) Failed to read float dataset data
*** Check failure stack trace: ***
    @     0x7ff51c22ddaa  (unknown)
    @     0x7ff51c22dce4  (unknown)
    @     0x7ff51c22d6e6  (unknown)
    @     0x7ff51c230687  (unknown)
    @     0x7ff51c783dfb  caffe::hdf5_load_nd_dataset<>()
    @     0x7ff51c87ecee  caffe::HDF5DataLayer<>::LoadHDF5FileData()
    @     0x7ff51c9209f5  caffe::HDF5DataLayer<>::Forward_gpu()
    @     0x7ff51c7f2a81  caffe::Net<>::ForwardFromTo()
    @     0x7ff51c7f2e07  caffe::Net<>::ForwardPrefilled()
    @     0x7ff51c7a6c41  caffe::Solver<>::Step()
    @     0x7ff51c7a7645  caffe::Solver<>::Solve()
    @     0x7ff51c7bdb95  caffe::P2PSync<>::run()
    @           0x40a6e1  train()
    @           0x408421  main
    @     0x7ff51ad30ec5  (unknown)
    @           0x408bdd  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

Some debugging that I've done

I'm not aware of any new formatting of my data. When debugging the H5-conatiner files:
import random
for h5_file_name in random.sample(h5_files, 4):
   
with h5py.File(h5_file_name, "r") as h5_file:
        data
= h5_file["data"]
        label
= h5_file["label"]
       
print h5_file_name
       
print np.array(label)
       
print "The current range is %.2f to %.2f with a mean of %.2f" % \
           
(np.min(data), np.max(data), np.mean(data))

I get a reasonable output (there are 10 rows since the image is oversampled):

/media/max/Encrypted/Processed/hdf5/img_train_36943.h5
[[ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]]
The current range is -0.70 to 0.33 with a mean of 0.00

I guess changing to a different input type would be fine but I haven't found any good examples on how to setup a vector input - this doesn't seem to be available in the standard leveldb. Any  suggestions on how to address the issue or how to setup a different input structure are appreciated.
Message has been deleted

Max Gordon

unread,
Jan 25, 2016, 3:29:08 AM1/25/16
to Caffe Users
The error seems to be that labels must be >= 0. There seems to have been a change that I missed and after changing to non-negative values everything seems to run.

Jan C Peters

unread,
Jan 25, 2016, 9:48:14 AM1/25/16
to Caffe Users
Really? that is strange, disallowing negative values does not make any sense to me. Negative values can actually be very meaningful in some regression problems, or in zero-mean image data.

What does _not_ make sense however is having negative values used as class labels, in the way a SoftMaxWithLoss layer would expect them, then they need to be in the [0, N-1] range.

Jan

Max Gordon

unread,
Jan 25, 2016, 10:06:53 AM1/25/16
to Caffe Users
Makes sense, although not entirely intuitive as a n00b. I wish the error was slightly more informative/guiding.
Reply all
Reply to author
Forward
0 new messages