HDF5-io error for input data

Max Gordon

unread,

Dec 30, 2015, 9:56:44 AM12/30/15

to Caffe Users

After adding two outcomes to my network the network crashes when trying to read dataset. Not sure why this suddenly happens, below follows a description of the error and setup. I don't think its related to the bug: https://github.com/BVLC/caffe/issues/1726 that addresses output data, but I'm currently not sure even where to start looking for the problem. I'm running the latest git-version on a Ubuntu 14.04 machine with two K40c cards.

Layer setup

I'm using HDF5-containers as I have 7 labels associated with each image. In the prototxt I use a simple slicer to separate the label-blob from the HDF5-file:

layer {
    name: "Label_slicer"
    type: "Slice"
    bottom: "label"
    top: "label_var1"
    top: "label_var2"
    top: "label_var3"
    top: "label_var4"
    top: "label_var5"
    top: "label_var6"
    top: "label_var7"
    slice_param {
        slice_point: 1
        slice_point: 2
        slice_point: 3
        slice_point: 4
        slice_point: 5
        slice_point: 6
        axis: 1
    }
}

On top of each layer I have an accuracy and a softmax:

layer {
  name: "accuracy_prev_fracture"
  type: "Accuracy"
  bottom: "fc8_l1"
  bottom: "label_var1"
  top: "accuracy_var1"
  include {
    phase: TEST
  }
  accuracy_param {
      ignore_label: 0
  }
}
layer {
  name: "loss_var1"
  type: "SoftmaxWithLoss"
  bottom: "fc8_l1"
  bottom: "label_var1"
  top: "loss_prev_fracture"
  loss_param {
      ignore_label: 0
  }
}

Each top layer have their own hidden layer, i.e. the fc8. At the bottom I have two simple HDF5-layer:

layer {
  name: "Wrists"
  type: "HDF5Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "/media/max/Encrypted/Processed/hdf5/all_train.txt"
    batch_size: 256
  }
}
layer {
  name: "Wrists"
  type: "HDF5Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  hdf5_data_param {
    source: "/media/max/Encrypted/Processed/hdf5/all_validation.txt"
    batch_size: 50
  }
}

The error

The rest is basically standard AlexNet setup. I've previously ran this setup with 5 output layers, after adding the additional outputs I get the cryptic HDF5-error:

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 140690729749056:
#000: ../../../src/H5Dio.c line 182 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
#001: ../../../src/H5Dio.c line 550 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
#002: ../../../src/H5Dchunk.c line 1837 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
#003: ../../../src/H5Dchunk.c line 2868 in H5D__chunk_lock(): data pipeline read failed
    major: Data filters
    minor: Filter operation failed
#004: ../../../src/H5Z.c line 1175 in H5Z_pipeline(): filter returned failure during read
    major: Data filters
    minor: Read failed
#005: ../../../src/H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed
    major: Data filters
    minor: Unable to initialize object
F1230 15:28:18.565651 23368 hdf5.cpp:72] Check failed: status >= 0 (-1 vs. 0) Failed to read float dataset data
*** Check failure stack trace: ***
    @     0x7ff51c22ddaa (unknown)
    @     0x7ff51c22dce4 (unknown)
    @     0x7ff51c22d6e6 (unknown)
    @     0x7ff51c230687 (unknown)
    @     0x7ff51c783dfb caffe::hdf5_load_nd_dataset<>()
    @     0x7ff51c87ecee caffe::HDF5DataLayer<>::LoadHDF5FileData()
    @     0x7ff51c9209f5 caffe::HDF5DataLayer<>::Forward_gpu()
    @     0x7ff51c7f2a81 caffe::Net<>::ForwardFromTo()
    @     0x7ff51c7f2e07 caffe::Net<>::ForwardPrefilled()
    @     0x7ff51c7a6c41 caffe::Solver<>::Step()
    @     0x7ff51c7a7645 caffe::Solver<>::Solve()
    @     0x7ff51c7bdb95 caffe::P2PSync<>::run()
    @           0x40a6e1 train()
    @           0x408421 main
    @     0x7ff51ad30ec5 (unknown)
    @           0x408bdd (unknown)
    @              (nil) (unknown)
Aborted (core dumped)

Some debugging that I've done

I'm not aware of any new formatting of my data. When debugging the H5-conatiner files:

import random
for h5_file_name in random.sample(h5_files, 4):
    with h5py.File(h5_file_name, "r") as h5_file:
        data = h5_file["data"]
        label = h5_file["label"]
        print h5_file_name
        print np.array(label)
        print "The current range is %.2f to %.2f with a mean of %.2f" % \
            (np.min(data), np.max(data), np.mean(data))

I get a reasonable output (there are 10 rows since the image is oversampled):

/media/max/Encrypted/Processed/hdf5/img_train_36943.h5
[[ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]
 [ 0.  0.  1. -1.  1. -1. -1.]]
The current range is -0.70 to 0.33 with a mean of 0.00

I guess changing to a different input type would be fine but I haven't found any good examples on how to setup a vector input - this doesn't seem to be available in the standard leveldb. Any suggestions on how to address the issue or how to setup a different input structure are appreciated.

Message has been deleted

Max Gordon

unread,

Jan 25, 2016, 3:29:08 AM1/25/16

to Caffe Users

The error seems to be that labels must be >= 0. There seems to have been a change that I missed and after changing to non-negative values everything seems to run.

Jan C Peters

unread,

Jan 25, 2016, 9:48:14 AM1/25/16

to Caffe Users

Really? that is strange, disallowing negative values does not make any sense to me. Negative values can actually be very meaningful in some regression problems, or in zero-mean image data.

What does _not_ make sense however is having negative values used as class labels, in the way a SoftMaxWithLoss layer would expect them, then they need to be in the [0, N-1] range.

Jan

Max Gordon

unread,

Jan 25, 2016, 10:06:53 AM1/25/16

to Caffe Users

Makes sense, although not entirely intuitive as a n00b. I wish the error was slightly more informative/guiding.

Reply all

Reply to author

Forward