Basic problem: blob size exceeds INT_MAX, when using ImageData layer

p.Paul

unread,

Feb 7, 2017, 9:53:48 AM2/7/17

to Caffe Users

I am using gray image(1X1920x1080) as input and colour image as output.(3X1920x1080)

my portotxt file looks likes this:

layer {
name: "image_data"
type: "ImageData"
top: "data"
top: "label1"
include {
    phase: TRAIN
}
transform_param {
    scale: 0.00390625
}
image_data_param {
    source: "data/imagenet/deps/train_pp.txt"
    root_folder: "data/imagenet/deps/"
    batch_size: 50
    is_color: false
    shuffle: false
}
}

layer {
name: "lp_labels"
type: "ImageData"
top: "lp_labels"
top: "label2"
include {
    phase: TRAIN
}
transform_param {
    scale: 0.00390625
}
image_data_param {
    source: "data/imagenet/rgb/val_pp.txt"
    root_folder:"data/imagenet/rgb/"
    batch_size: 50
    is_color: true
    shuffle: false
}
}
                              .
.
.
.
.

layer {
name: "lp_fc8"
type: "InnerProduct"
bottom: "fc7"
top: "lp_fc8"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
inner_product_param {

    num_output:2073600 #(where 1920x1080=2073600,    3X1920x1080=6220800    ,50x3X1920x1080=311040000 )
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
}
}

layer {
name: "sigmoid8"
type: "Sigmoid"
bottom: "lp_fc8"
top: "predict"
}

layer {
name: "lp_loss"
type: "EuclideanLoss"
bottom: "predict"
bottom: "lp_labels"
top: "lp_loss"
loss_param {

}
loss_weight: 20
}

Error: I am getting the following error. The size is <2gb.
1: Is there any constraints about the number of outputs at innerproduct layer?
2: Or Is there some relationship between the number of outputs of previous or following layers?

I am new to caffe, please help me.

Error:
lp_fc8 -> lp_fc8
shape[i] <= 2147483647 / count_ (4096 vs. 1035) blob size exceeds INT_MAX

p.Paul

unread,

Feb 7, 2017, 10:13:52 AM2/7/17

to Caffe Users

I have tried all values like, error is the same

1920x1080=2073600,
3X1920x1080=6220800
50x3X1920x1080=311040000

Przemek D

unread,

Feb 8, 2017, 2:55:18 AM2/8/17

to Caffe Users

Yes there is a very important connection between InnerProduct layers, also called fully-connected because they connect each pixel of the input blob to each pixel of the output blob. So if you have a FC layer that takes 1024 input pixels and outputs 1024 pixels, you'll find that parameters of this layers weigh 4 bytes per pixel * 1024 outputs * 1024 inputs = 4 megabytes. You're trying to make a fully-connected layer with 2 million outputs. It seems like this layer is getting too large - what is the shape of blob fc7?

p.Paul

unread,

Feb 8, 2017, 5:32:38 AM2/8/17

to Caffe Users

thank you very much for your response

layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"

param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
inner_product_param {

    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
    dropout_ratio: 0.5
}
}
This is how fc7 looks likes.

I have 3 Channel color images as labels and I need the same at the last layer prediction. What should I do? I am sorry if it is a stupid question!

p.Paul

unread,

Feb 9, 2017, 11:11:49 AM2/9/17

to Caffe Users

Could you please explain the connection?

Przemek D

unread,

Feb 13, 2017, 4:00:36 AM2/13/17

to Caffe Users

InnerProduct layer connects each input pixel with each output pixel with a weight. Your fc7 outputs 4,096 pixels. Your fc8 inputs these 4,096 and outputs 2,073,600 pixels. This gives 8,493,465,600 weights which would weigh almost 32 gigabytes - and this is just for the smallest 1920x1080 image. Obviously it is impossible to create a layer of this size (because currently there don't exist GPUs with that much memory).

Besides, FC layers are not a good choice if you want to input an image and output another image. Why? Because FC destroys all spatial information. Since everything is connected to everything, you cannot by any means retrieve what was where. This is a domain of convolutional networks, which preserve spatial information due to the locality of their connections.

I don't know what are you trying to do, but this sounds like either some kind of autoencoder (or other encoder-decoder setup) or segmentation task - neither is a good choice for your first project in caffe. I recommend reading "Creation of a Deep Convolutional Auto-Encoder in Caffe" (Turchenko, Luczak) and "Fully Convolutional Networks for Semantic Segmentation" (Long et al.) for examples of networks that input and output images, just to see how this task is currently approached.

p.Paul

unread,

Feb 13, 2017, 4:37:58 AM2/13/17

to Caffe Users

Thank you very much. :)

Reply all

Reply to author

Forward