Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

copy the keras model architecture to caffe and train it

127 views
Skip to first unread message

Shawn Lee

unread,
Nov 17, 2017, 2:05:08 AM11/17/17
to Caffe Users
Hi All,

I have a nice performance model in Keras. as below

def create_model2():
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(32, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
return model


model=create_model2()
rms = RMSprop()
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=rms,
metrics=['accuracy'])

I try to copy the architecture to caffe. as below
I dont expect the same performance, but should be similar with same parameters for same dataset 

prototxt:
name: ""
layer {
  name: "test_from_keras"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/images/train/img_train_lmdb"
    batch_size: 6
    backend: LMDB
  }
}
layer {
  name: "test_from_keras"
type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "examples/images/test/img_train_lmdb" batch_size: 6 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 kernel_size: 3 stride: 1 weight_filler { type: "uniform" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 kernel_size: 3 stride: 1 weight_filler { type: "uniform" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 64 kernel_size: 3 stride: 1 weight_filler { type: "uniform" } bias_filler { type: "constant" } } } layer { name: "pool3" type: "Pooling" bottom: "conv3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "flatdata" type: "Flatten" bottom: "pool3" top: "flatdata" } layer { name: "ip1" type: "InnerProduct" bottom: "flatdata" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 128 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 2 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" top: "accuracies" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }

solver:
................
base_lr: 0.01
momentum: 0.0
weight_decay: 0.0005
# The learning rate policy
#lr_policy: "inv"
lr_policy: "poly"
gamma: 0.0001
power: 0.75
type: "RMSProp"
rms_decay: 0.98
......................

and I get result:

training loss doesn't decrease


I1117 14:55:46.197170  5362 sgd_solver.cpp:106] Iteration 595, lr = 0.0957578
I1117 14:55:46.530448  5362 solver.cpp:228] Iteration 596, loss = 0.716614
I1117 14:55:46.530511  5362 solver.cpp:244]     Train net output #0: loss = 0.716614 (* 1 = 0.716614 loss)
I1117 14:55:46.530525  5362 sgd_solver.cpp:106] Iteration 596, lr = 0.095751
I1117 14:55:46.875291  5362 solver.cpp:228] Iteration 597, loss = 0.76887
I1117 14:55:46.875366  5362 solver.cpp:244]     Train net output #0: loss = 0.76887 (* 1 = 0.76887 loss)
I1117 14:55:46.875385  5362 sgd_solver.cpp:106] Iteration 597, lr = 0.0957443
I1117 14:55:47.215499  5362 solver.cpp:228] Iteration 598, loss = 0.733329
I1117 14:55:47.215590  5362 solver.cpp:244]     Train net output #0: loss = 0.733329 (* 1 = 0.733329 loss)
I1117 14:55:47.215610  5362 sgd_solver.cpp:106] Iteration 598, lr = 0.0957375
I1117 14:55:47.551973  5362 solver.cpp:228] Iteration 599, loss = 0.751441
I1117 14:55:47.552047  5362 solver.cpp:244]     Train net output #0: loss = 0.75144 (* 1 = 0.75144 loss)
......................
I1117 14:55:48.606860  5362 solver.cpp:404]     Test net output #0: accuracies = 0
I1117 14:55:48.606914  5362 solver.cpp:404]     Test net output #1: accuracies = 0.9
I1117 14:55:48.606925  5362 solver.cpp:404]     Test net output #2: accuracy = 0.5
I1117 14:55:48.606937  5362 solver.cpp:404]     Test net output #3: loss = 0.697836 (* 1 = 0.697836 loss)


what did I do wrong?
As I know now, caffe dont support categorical_crossentropy. 
How can I fix it?

Thanks.



Reply all
Reply to author
Forward
0 new messages