copy the keras model architecture to caffe and train it

127 views
Skip to first unread message

Shawn Lee

unread,
Nov 17, 2017, 2:05:08 AM11/17/17
to Caffe Users
Hi All,

I have a nice performance model in Keras. as below

def create_model2():
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(32, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
return model


model=create_model2()
rms = RMSprop()
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=rms,
metrics=['accuracy'])

I try to copy the architecture to caffe. as below
I dont expect the same performance, but should be similar with same parameters for same dataset 

prototxt:
name: ""
layer {
  name: "test_from_keras"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/images/train/img_train_lmdb"
    batch_size: 6
    backend: LMDB
  }
}
layer {
  name: "test_from_keras"
type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "examples/images/test/img_train_lmdb" batch_size: 6 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 kernel_size: 3 stride: 1 weight_filler { type: "uniform" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 kernel_size: 3 stride: 1 weight_filler { type: "uniform" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 64 kernel_size: 3 stride: 1 weight_filler { type: "uniform" } bias_filler { type: "constant" } } } layer { name: "pool3" type: "Pooling" bottom: "conv3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "flatdata" type: "Flatten" bottom: "pool3" top: "flatdata" } layer { name: "ip1" type: "InnerProduct" bottom: "flatdata" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 128 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 2 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" top: "accuracies" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }

solver:
................
base_lr: 0.01
momentum: 0.0
weight_decay: 0.0005
# The learning rate policy
#lr_policy: "inv"
lr_policy: "poly"
gamma: 0.0001
power: 0.75
type: "RMSProp"
rms_decay: 0.98
......................

and I get result:

training loss doesn't decrease


I1117 14:55:46.197170  5362 sgd_solver.cpp:106] Iteration 595, lr = 0.0957578
I1117 14:55:46.530448  5362 solver.cpp:228] Iteration 596, loss = 0.716614
I1117 14:55:46.530511  5362 solver.cpp:244]     Train net output #0: loss = 0.716614 (* 1 = 0.716614 loss)
I1117 14:55:46.530525  5362 sgd_solver.cpp:106] Iteration 596, lr = 0.095751
I1117 14:55:46.875291  5362 solver.cpp:228] Iteration 597, loss = 0.76887
I1117 14:55:46.875366  5362 solver.cpp:244]     Train net output #0: loss = 0.76887 (* 1 = 0.76887 loss)
I1117 14:55:46.875385  5362 sgd_solver.cpp:106] Iteration 597, lr = 0.0957443
I1117 14:55:47.215499  5362 solver.cpp:228] Iteration 598, loss = 0.733329
I1117 14:55:47.215590  5362 solver.cpp:244]     Train net output #0: loss = 0.733329 (* 1 = 0.733329 loss)
I1117 14:55:47.215610  5362 sgd_solver.cpp:106] Iteration 598, lr = 0.0957375
I1117 14:55:47.551973  5362 solver.cpp:228] Iteration 599, loss = 0.751441
I1117 14:55:47.552047  5362 solver.cpp:244]     Train net output #0: loss = 0.75144 (* 1 = 0.75144 loss)
......................
I1117 14:55:48.606860  5362 solver.cpp:404]     Test net output #0: accuracies = 0
I1117 14:55:48.606914  5362 solver.cpp:404]     Test net output #1: accuracies = 0.9
I1117 14:55:48.606925  5362 solver.cpp:404]     Test net output #2: accuracy = 0.5
I1117 14:55:48.606937  5362 solver.cpp:404]     Test net output #3: loss = 0.697836 (* 1 = 0.697836 loss)


what did I do wrong?
As I know now, caffe dont support categorical_crossentropy. 
How can I fix it?

Thanks.



Reply all
Reply to author
Forward
0 new messages