Caffe error rate not converging?

490 views
Skip to first unread message

olddocks

unread,
Feb 3, 2015, 2:19:13 PM2/3/15
to caffe...@googlegroups.com
Hi

I have finally predicted the results of a linear regression, it seems that network is stuck and not converging.  I have tried all learning rates .001, .01, .1 and 1.0. I have 96x grayscale images and i run 4 convolutions and 4 pooling to predict 30 outputs.

I0203 20:09:22.603837  3290 solver.cpp:445] Iteration 500, lr = 1
I0203
20:09:39.274199  3290 solver.cpp:209] Iteration 600, loss = 365.468
I0203
20:09:39.274451  3290 solver.cpp:224]     Train net output #0: loss = 365.467 (* 1 = 365.467 loss)
I0203
20:09:39.274487  3290 solver.cpp:445] Iteration 600, lr = 1
I0203
20:09:55.949723  3290 solver.cpp:209] Iteration 700, loss = 57.9001
I0203
20:09:55.949863  3290 solver.cpp:224]     Train net output #0: loss = 57.899 (* 1 = 57.899 loss)
I0203
20:09:55.949892  3290 solver.cpp:445] Iteration 700, lr = 1
I0203
20:10:12.633570  3290 solver.cpp:209] Iteration 800, loss = 386.487
I0203
20:10:12.633800  3290 solver.cpp:224]     Train net output #0: loss = 386.486 (* 1 = 386.486 loss)
I0203
20:10:12.633837  3290 solver.cpp:445] Iteration 800, lr = 1
I0203
20:10:29.324326  3290 solver.cpp:209] Iteration 900, loss = 365.468
I0203
20:10:29.324460  3290 solver.cpp:224]     Train net output #0: loss = 365.467 (* 1 = 365.467 loss)
I0203
20:10:29.324491  3290 solver.cpp:445] Iteration 900, lr = 1
I0203
20:10:46.017035  3290 solver.cpp:334] Snapshotting to /home/pbu/Desktop/tmp_iter_1000.caffemodel
I0203
20:10:46.038336  3290 solver.cpp:342] Snapshotting solver state to /home/pbu/Desktop/tmp_iter_1000.solverstate
I0203
20:10:46.099405  3290 solver.cpp:246] Iteration 1000, loss = 57.899
I0203
20:10:46.099505  3290 solver.cpp:264] Iteration 1000, Testing net (#0)
I0203
20:10:50.877327  3290 solver.cpp:315]     Test net output #0: loss = 91.646 (* 1 = 91.646 loss)

Take a look on my layer file

name: "FKPReg"

layers
{
  name
: "fkp"
  top
: "data"
  top
: "label"
  type
: HDF5_DATA
  hdf5_data_param
{
   source
: "train.txt"
   batch_size
: 100
 
}
    include
: { phase: TRAIN }
 
}

layers
{
  name
: "data"
  type
: HDF5_DATA
  top
: "data"
  top
: "label"
  hdf5_data_param
{
    source
: "test.txt"
    batch_size
: 100
   
 
}
  include
: { phase: TEST }
}

layers
{
  name
: "conv1"
  type
: CONVOLUTION
  bottom
: "data"
  top
: "conv1"
  convolution_param
{
    num_output
: 64
    kernel_size
: 11
    stride
: 2
 
}
}
layers
{
  name
: "relu1"
  type
: RELU
  bottom
: "conv1"
  top
: "conv1"
}
layers
{
  name
: "pool1"
  type
: POOLING
  bottom
: "conv1"
  top
: "pool1"
  pooling_param
{
    pool
: MAX
    kernel_size
: 2
    stride
: 2
 
}
}

layers
{
  name
: "conv2"
  type
: CONVOLUTION
  bottom
: "pool1"
  top
: "conv2"
  convolution_param
{
    num_output
: 128
    pad
: 2
    kernel_size
: 5
   
group: 2
 
}
}
layers
{
  name
: "relu2"
  type
: RELU
  bottom
: "conv2"
  top
: "conv2"
}
layers
{
  name
: "pool2"
  type
: POOLING
  bottom
: "conv2"
  top
: "pool2"
  pooling_param
{
    pool
: MAX
    kernel_size
: 3
    stride
: 2
 
}
}



layers
{
  name
: "conv3"
  type
: CONVOLUTION
  bottom
: "pool2"
  top
: "conv3"
  convolution_param
{
    num_output
: 128
    pad
: 1
    kernel_size
: 3
 
}
}
layers
{
  name
: "relu3"
  type
: RELU
  bottom
: "conv3"
  top
: "conv3"
}

layers
{
  name
: "pool3"
  type
: POOLING
  bottom
: "conv3"
  top
: "pool3"
  pooling_param
{
    pool
: MAX
    kernel_size
: 3
    stride
: 2
 
}
}



layers
{
  name
: "ip1"
  type
: INNER_PRODUCT
  bottom
: "pool3"
  top
: "ip1"  
  inner_product_param
{
    num_output
: 30
   
 
}
}

layers
{
  name
: "loss"
  type
: EUCLIDEAN_LOSS
  bottom
: "ip1"
  bottom
: "label"
  top
: "loss"
}





Message has been deleted

Dharma KC

unread,
Jun 19, 2017, 3:16:11 AM6/19/17
to Caffe Users


 Are you able to converge the euclidean loss? I am also stuck on similar problem.
Reply all
Reply to author
Forward
0 new messages