Simple MLP regression always predicts same value?

212 views
Skip to first unread message

Bo Moon

unread,
Apr 18, 2016, 9:46:27 AM4/18/16
to Caffe Users
I'm trying to get regression working with a simple example by learning the function f(a,b) = 3a + 4b. I create sample data and train an MLP, but the net's output is always 0. Am I using the wrong layer or command somewhere?

I use the following to create the data points in HDF5:
import numpy as np
import h5py
import sys
import caffe

file_labels = 'simple_labels.txt'
outputH5_prefix = 'simple'

X = []
y = []
for i in range(100):
for j in range(100):
f = 3 * i + 5 * j
X.append([i, j])
y.append([f])

outputH5 = outputH5_prefix + ".h5"

with h5py.File(outputH5,'w') as H:
    H.create_dataset( 'data', data=np.array(X).astype(np.float32) ) 
    H.create_dataset( 'label', data=np.array(y).astype(np.float32) )

with open(outputH5_prefix + '_h5_list.txt','w') as L:
    L.write(outputH5)

Here's the solver.prototxt:
net: "simple_net.prototxt"

type: "SGD"

test_iter: 10
test_interval: 50

base_lr: 0.01
weight_decay: 0.00005
momentum: 0.9

lr_policy: "step"
gamma: 0.1
stepsize: 10

display: 10
max_iter: 80000

snapshot: 1000
snapshot_prefix: "snapshots/simple_net"

solver_mode: CPU

Training net prototxt:
name: "simple_net"

layer {
  name: "input"
  type: "HDF5Data"

  hdf5_data_param {
    source: "simple_h5_list.txt"
    batch_size: 32
  }

  top: "data"
  top: "label"
}


layer {
  name: "relu1"
  type: "ReLU"
  bottom: "data"
  top: "conv1"
}

layer {
  name: "ip1"
  type: "InnerProduct"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  bottom: "conv1"
  top: "ip1"
}


layer {
  name: "relu3"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}

layer {
  name: "ip2"
  type: "InnerProduct"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  inner_product_param {
    num_output: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  bottom: "ip1"
  top: "ip2"
}

layer {
  name: "relu4"
  type: "ReLU"
  bottom: "ip2"
  top: "ip2"
}

layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "ip2" 
  bottom: "label" 
}

layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
}

Deploy net prototxt:
name: "letter_net"
input: "data"
input_dim: 10
input_dim: 1
input_dim: 1
input_dim: 2

layer {
  name: "relu1"
  type: "ReLU"
  bottom: "data"
  top: "conv1"
}

layer {
  name: "ip1"
  type: "InnerProduct"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  bottom: "conv1"
  top: "ip1"
}

layer {
  name: "relu3"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}

layer {
  name: "ip2"
  type: "InnerProduct"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  inner_product_param {
    num_output: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  bottom: "ip1"
  top: "ip2"
}

layer {
  name: "relu4"
  type: "ReLU"
  bottom: "ip2"
  top: "ip2"
}

Then I train using the command line. Finally, here's how I view the net output:
import numpy as np
import caffe

net = caffe.Net("simple_deploy.prototxt", "snapshots/simple_net_iter_80000.caffemodel", caffe.TEST)

net.blobs['data'].reshape(1, 1, 1, 2)
net.blobs['data'].data[...] = [10, 20]

output = net.forward()
print net.blobs['ip2'].data

which always outputs 0. Is there something wrong with how I've defined my net?

Jan

unread,
Apr 18, 2016, 10:08:10 AM4/18/16
to Caffe Users
How about the output during training? Does it look like it is training ok? (e.g. loss increases steadily). For a better answer to this question create another set (with different examples!) and use that as a test set. Look at how the testing loss develops.

Jan

Jan

unread,
Apr 18, 2016, 10:11:48 AM4/18/16
to Caffe Users
by the way: An accuracy layer is completely misplaced in regression tasks. There is usually no well-defined notion of "accuracy" in regression tasks (other than a mean loss value or similar). The caffe accuracy layer expects integer class values as labels and data holding a predicted discrete PDF over the class indices. So in this case it makes no sense at all.

Jan

Bo Moon

unread,
Apr 19, 2016, 8:35:38 AM4/19/16
to Caffe Users
Thanks for the tip, I removed the accuracy layer. If I plot the losses from training, it oscillates the whole time. It begins at around 60, and the minimum is 54, and the output of the net is always 1; I tried normalized the data to be in (0,1) now, so it's predicting the wrong value almost every time. The test loss is basically the same. I've run classification tutorials from the website successfully, so I'm wondering if I'm either defining the input incorrectly or am defining the net layers poorly.

Jan

unread,
Apr 19, 2016, 8:53:56 AM4/19/16
to Caffe Users
The train loss is not very meaningful, as it is just the loss of the last processed batch. This is sure to oscillate. The test loss is more meaningful as it is averaged over all test iterations. This should go down quickly at the start for sure. How much it decreases afterwards depends on many things: a good learning rate policy, good regularization, the data itself... And what "good" means is also very dependent on your problem and your data.

Jan

Bo Moon

unread,
Apr 19, 2016, 10:56:04 AM4/19/16
to Caffe Users
I see, thanks for the help. Before I go crazy with parameter tweaking, can I confirm that my Python script for saving HDF5 data and the net input layer are both written correctly? I haven't found a tutorial for reading/writing non-image data, so I'm slightly worried that I just didn't format the data correctly. If I have though, then I'll go ahead and continue experimenting with parameters.

Jan

unread,
Apr 21, 2016, 2:51:16 AM4/21/16
to Caffe Users
Your script to create the HDF5 data seems perfectly fine to me. Of course you could use the "hdfview" program to view the contents of your HDF5 file. If every row in "data" contains two numbers and every the corresponding row in "label" the correct result (3*i+5*j) then everything should be fine. I think it already is. Your problems are probably due to the training parameters.

Jan

Ahmed Ibrahim

unread,
Apr 21, 2016, 11:20:57 AM4/21/16
to Caffe Users
why do you have in your training
name: "simple_net"
while in your deploy
name: "letter_net"

I am not sure but fixing that is easy to make sure it is not the problem.
Reply all
Reply to author
Forward
0 new messages