Homemade Regression Network Training Issues

116 views

Skip to first unread message

bc1...@my.bristol.ac.uk

unread,

Jul 28, 2015, 7:43:59 AM7/28/15

to Caffe Users

Hello,

I am attempting to use Caffe to train a network which takes depth images of humans (an example is the attached depth_290.png) and regresses to a real valued 3 vector which encodes the human's pose. ( e.g. [-0.94429719 -0.69393933 -0.42636588])

We saw reasonable results from fine tuning the ILSVRC trained alexnet for this data, but we found that many of the activations in the lower layers were 0 for all inputs. We hypothesised that many of these edge detecting filters were not relevant to our data which is fairly edgeless.

So we have started trying to train our own architecture from scratch for which I have attached the definition, the solver settings and a diagram.

We wanted to keep the network small and shallow so as to enable fairly quick training/feedback. We settled on the filter sizes by scaling them to the sizes of what we imagined our low, medium and high level features to be e.g. edges, arms/legs and shoulders/torso.

We have a training set of ~7000 images and a 500 image validation set.

There have been a couple of issues with the training which I hope some of you experienced people could help with:

1) Training is proceeding very slowly, around 1/10 of the number of iterations per hour compared to when fine tuning the bvlc_alexnet model.

the blob and filters sizes are:

blob sizes:
[('data', (1, 1, 128, 256)), ('label', (1, 3)), ('conv1', (1, 16, 129, 257)), ('pool1', (1, 16, 65, 129)), ('conv2', (1, 32, 66, 130)), ('pool2', (1, 32, 33, 65)), ('conv3', (1, 32, 34, 66)), ('pool3', (1, 32, 17, 33)), ('fc4', (1, 1024)), ('fc5', (1, 3)), ('loss', ())]
total= 1.014E6 (~1/6 of alexNet)

filter sizes:
[('conv1', (16, 1, 12, 12)), ('conv2', (32, 16, 16, 16)), ('conv3', (32, 32, 14, 14)), ('fc4', (1024, 17952)), ('fc5', (3, 1024))]
total=1.853E7 (~1/5 of alexNet)

However I have been using a batch size of 100 with my 'gaitNet' as opposed to ~40 with alexNet.

If increasing the batch size also increases the time it takes to run the iterations is the only benefit of a larger batch size the smoothness of the loss curves?

2) Looking at the filters shows that many of them have areas that have stuck at 0.

An example is shown in badFilters.png, and the numerical values for that particular filter are:

('bias =', -0.0031663382)
[[[ 0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -8.57868945e-05]
[ -0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -5.28223626e-03]
[ -0.00000000e+00   0.00000000e+00   1.13819588e-04 ...,   0.00000000e+00
    -0.00000000e+00 -7.32066343e-03]
...,
[ 0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   4.31772508e-03
     4.43827407e-03   4.62894700e-03]
[ 0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   1.40158751e-03
     1.68411294e-03   1.93981850e-03]
[ 1.17583564e-04 -3.98732768e-03   1.41068902e-02 ..., -6.62319944e-04
     5.30000357e-03 -1.04972825e-03]]

[[ -2.17889603e-02   1.87065755e-03 -3.05620930e-03 ..., -1.54024875e-03
     0.00000000e+00   0.00000000e+00]
[ 5.06310444e-03   1.66628743e-03   1.69640519e-02 ..., -5.62675996e-03
     0.00000000e+00   0.00000000e+00]
[ -8.56398910e-05 -3.06883268e-03   1.67703126e-02 ..., -1.56719349e-02
     2.26232829e-03 -7.80450134e-03]
...,
[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
     0.00000000e+00 -0.00000000e+00]
[ -0.00000000e+00 -0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]
[ 0.00000000e+00   0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]]

[[ 1.03708601e-03 -1.73622603e-03   2.55606975e-03 ..., -1.72646448e-03
     1.01272371e-02   7.27093022e-04]
[ -9.83404927e-03 -8.40324350e-03   2.91342870e-03 ..., -2.91749183e-03
    -8.55376420e-04   6.25598757e-03]
[ -1.93446539e-02 -1.96082015e-02 -1.11028543e-02 ...,   1.17850583e-02
    -1.15024736e-02   2.74277129e-03]
...,
[ 1.25999795e-02 -5.72544476e-03 -1.16771702e-02 ...,   0.00000000e+00
     0.00000000e+00   0.00000000e+00]
[ 1.78328641e-02   2.15554046e-05   1.36204557e-02 ...,   0.00000000e+00
    -0.00000000e+00   0.00000000e+00]
[ -1.48998750e-02   1.99692044e-02 -1.85439140e-02 ...,   0.00000000e+00
     0.00000000e+00 -0.00000000e+00]]

...,
[[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00   0.00000000e+00]
[ 0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]
[ 0.00000000e+00 -0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00   0.00000000e+00]
...,
[ 0.00000000e+00   0.00000000e+00 -0.00000000e+00 ...,   1.83713192e-03
     2.48013111e-03   3.03608878e-03]
[ -0.00000000e+00   0.00000000e+00 -0.00000000e+00 ...,   1.30352436e-03
     3.50991264e-04   7.24491139e-04]
[ -1.88714452e-02 -9.72690061e-03 -1.35165090e-02 ...,   5.61010977e-03
     2.59630568e-03   2.51519983e-03]]

[[ -0.00000000e+00 -0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00 -0.00000000e+00]('filter number =', 13)
('bias =', -0.0031663382)
[[[ 0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -8.57868945e-05]
[ -0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -5.28223626e-03]
[ -0.00000000e+00   0.00000000e+00   1.13819588e-04 ...,   0.00000000e+00
    -0.00000000e+00 -7.32066343e-03]
...,
[ 0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   4.31772508e-03
     4.43827407e-03   4.62894700e-03]
[ 0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   1.40158751e-03
     1.68411294e-03   1.93981850e-03]
[ 1.17583564e-04 -3.98732768e-03   1.41068902e-02 ..., -6.62319944e-04
     5.30000357e-03 -1.04972825e-03]]

[[ -2.17889603e-02   1.87065755e-03 -3.05620930e-03 ..., -1.54024875e-03
     0.00000000e+00   0.00000000e+00]
[ 5.06310444e-03   1.66628743e-03   1.69640519e-02 ..., -5.62675996e-03
     0.00000000e+00   0.00000000e+00]
[ -8.56398910e-05 -3.06883268e-03   1.67703126e-02 ..., -1.56719349e-02
     2.26232829e-03 -7.80450134e-03]
...,
[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
     0.00000000e+00 -0.00000000e+00]
[ -0.00000000e+00 -0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]
[ 0.00000000e+00   0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]]

[[ 1.03708601e-03 -1.73622603e-03   2.55606975e-03 ..., -1.72646448e-03
     1.01272371e-02   7.27093022e-04]
[ -9.83404927e-03 -8.40324350e-03   2.91342870e-03 ..., -2.91749183e-03
    -8.55376420e-04   6.25598757e-03]
[ -1.93446539e-02 -1.96082015e-02 -1.11028543e-02 ...,   1.17850583e-02
    -1.15024736e-02   2.74277129e-03]
...,
[ 1.25999795e-02 -5.72544476e-03 -1.16771702e-02 ...,   0.00000000e+00
     0.00000000e+00   0.00000000e+00]
[ 1.78328641e-02   2.15554046e-05   1.36204557e-02 ...,   0.00000000e+00
    -0.00000000e+00   0.00000000e+00]
[ -1.48998750e-02   1.99692044e-02 -1.85439140e-02 ...,   0.00000000e+00
     0.00000000e+00 -0.00000000e+00]]

...,
[[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00   0.00000000e+00]
[ 0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]
[ 0.00000000e+00 -0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00   0.00000000e+00]
...,
[ 0.00000000e+00   0.00000000e+00 -0.00000000e+00 ...,   1.83713192e-03
     2.48013111e-03   3.03608878e-03]
[ -0.00000000e+00   0.00000000e+00 -0.00000000e+00 ...,   1.30352436e-03
     3.50991264e-04   7.24491139e-04]
[ -1.88714452e-02 -9.72690061e-03 -1.35165090e-02 ...,   5.61010977e-03
     2.59630568e-03   2.51519983e-03]]

[[ -0.00000000e+00 -0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00 -0.00000000e+00]
[ -0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00   0.00000000e+00]
[ -0.00000000e+00 -0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]
...,
[ 0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00   0.00000000e+00]
[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00   0.00000000e+00]
[ -0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00   0.00000000e+00]]

[[ 0.00000000e+00   9.22929157e-24 -2.95314767e-38 ...,   1.52926743e-02
    -1.63858123e-02 -1.33707384e-02]
[ 8.33217148e-03 -1.08572282e-03 -6.57732226e-03 ...,   1.61870793e-02
     1.26907490e-02   3.08887637e-03]
[ 5.52257150e-03   1.79179870e-02   1.18585369e-02 ..., -7.79362489e-03
     4.90787346e-03 -2.03952356e-03]
...,
[ -0.00000000e+00 -0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]
[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
     0.00000000e+00 -0.00000000e+00]
[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]]]

[ -0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00   0.00000000e+00]
[ -0.00000000e+00 -0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]
...,
[ 0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00   0.00000000e+00]
[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ..., -0.00000000e+00
     0.00000000e+00   0.00000000e+00]
[ -0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00   0.00000000e+00]]

[[ 0.00000000e+00   9.22929157e-24 -2.95314767e-38 ...,   1.52926743e-02
    -1.63858123e-02 -1.33707384e-02]
[ 8.33217148e-03 -1.08572282e-03 -6.57732226e-03 ...,   1.61870793e-02
     1.26907490e-02   3.08887637e-03]
[ 5.52257150e-03   1.79179870e-02   1.18585369e-02 ..., -7.79362489e-03
     4.90787346e-03 -2.03952356e-03]
...,
[ -0.00000000e+00 -0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]
[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
     0.00000000e+00 -0.00000000e+00]
[ -0.00000000e+00   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
    -0.00000000e+00 -0.00000000e+00]]]

What could cause this? Am I correct in thinking once a weight is zero it is stuck there?

3) I read in http://cs231n.github.io/neural-networks-3/#distr that a good weight initialisation should produce roughly similar scaled variances of the activations in each layer. When using the xavier or msra weight initialisations and a 0.001 constant bias filler on all layers I find variances of activations:

varConv1 =

1.4363e+04

varConv2 =

2.2032e+05

varConv3 =

     9992700

varFc4 =

625.8359

varFc5 =

841.8120

Is this acceptable? or cause for concern?

Also when using these initialisations I find that the initial losses are very high and that if I use SGD or Nesterov's then the losses go to inf after a few iterations unless the learning rate is smaller than 1E-6.

Although we have been using ADAGRAD (which does not lead to divergence) for the unrelated reason that we have some under-represented poses in our dataset which we think the AdaGrad LR scaling could help with.

Finally, before leaving the network to train on the full training set I ran it on a 20 image subset to see if it what level of loss it could reach, it managed to reduce the ave loss to around the level we saw the alexNet fine tuned to after ~200000 iters which is

Any points or advice would be greatly appreciated.

Thanks for reading,

Ben

badFIlters.png

diagram1.png

solver.prototxt

train_val.prototxt

depth_290.png

Reply all

Reply to author

Forward

0 new messages