Caffe error rate not converging?

490 views

LRNcaffegpugrayscalehdf5pycafferegressiontesting

Skip to first unread message

olddocks

unread,

Feb 3, 2015, 2:19:13 PM2/3/15

to caffe...@googlegroups.com

Hi

I have finally predicted the results of a linear regression, it seems that network is stuck and not converging. I have tried all learning rates .001, .01, .1 and 1.0. I have 96x grayscale images and i run 4 convolutions and 4 pooling to predict 30 outputs.

I0203 20:09:22.603837  3290 solver.cpp:445] Iteration 500, lr = 1
I0203 20:09:39.274199  3290 solver.cpp:209] Iteration 600, loss = 365.468
I0203 20:09:39.274451  3290 solver.cpp:224]     Train net output #0: loss = 365.467 (* 1 = 365.467 loss)
I0203 20:09:39.274487  3290 solver.cpp:445] Iteration 600, lr = 1
I0203 20:09:55.949723  3290 solver.cpp:209] Iteration 700, loss = 57.9001
I0203 20:09:55.949863  3290 solver.cpp:224]     Train net output #0: loss = 57.899 (* 1 = 57.899 loss)
I0203 20:09:55.949892  3290 solver.cpp:445] Iteration 700, lr = 1
I0203 20:10:12.633570  3290 solver.cpp:209] Iteration 800, loss = 386.487
I0203 20:10:12.633800  3290 solver.cpp:224]     Train net output #0: loss = 386.486 (* 1 = 386.486 loss)
I0203 20:10:12.633837  3290 solver.cpp:445] Iteration 800, lr = 1
I0203 20:10:29.324326  3290 solver.cpp:209] Iteration 900, loss = 365.468
I0203 20:10:29.324460  3290 solver.cpp:224]     Train net output #0: loss = 365.467 (* 1 = 365.467 loss)
I0203 20:10:29.324491  3290 solver.cpp:445] Iteration 900, lr = 1
I0203 20:10:46.017035  3290 solver.cpp:334] Snapshotting to /home/pbu/Desktop/tmp_iter_1000.caffemodel
I0203 20:10:46.038336  3290 solver.cpp:342] Snapshotting solver state to /home/pbu/Desktop/tmp_iter_1000.solverstate
I0203 20:10:46.099405  3290 solver.cpp:246] Iteration 1000, loss = 57.899
I0203 20:10:46.099505  3290 solver.cpp:264] Iteration 1000, Testing net (#0)
I0203 20:10:50.877327  3290 solver.cpp:315]     Test net output #0: loss = 91.646 (* 1 = 91.646 loss)

Take a look on my layer file

name: "FKPReg"

layers {
  name: "fkp"
  top: "data"
  top: "label"
  type: HDF5_DATA
  hdf5_data_param {
   source: "train.txt"
   batch_size: 100
  }
    include: { phase: TRAIN }
  
}

layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "test.txt"
    batch_size: 100
   
  }
  include: { phase: TEST }
}

layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 64
    kernel_size: 11
    stride: 2
  }
}
layers {
  name: "relu1"
  type: RELU
  bottom: "conv1"
  top: "conv1"
}
layers {
  name: "pool1"
  type: POOLING
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layers {
  name: "conv2"
  type: CONVOLUTION
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 128
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layers {
  name: "relu2"
  type: RELU
  bottom: "conv2"
  top: "conv2"
}
layers {
  name: "pool2"
  type: POOLING
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}



layers {
  name: "conv3"
  type: CONVOLUTION
  bottom: "pool2"
  top: "conv3"
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
  }
}
layers {
  name: "relu3"
  type: RELU
  bottom: "conv3"
  top: "conv3"
}

layers {
  name: "pool3"
  type: POOLING
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}



layers {
  name: "ip1"
  type: INNER_PRODUCT
  bottom: "pool3"
  top: "ip1"  
  inner_product_param {
    num_output: 30
   
  }
}

layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}

Message has been deleted

Dharma KC

unread,

Jun 19, 2017, 3:16:11 AM6/19/17

to Caffe Users

Are you able to converge the euclidean loss? I am also stuck on similar problem.

Reply all

Reply to author

Forward

0 new messages