Hi All,
I have a nice performance model in Keras. as below
def create_model2():
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(32, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
return model
model=create_model2()
rms = RMSprop()
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=rms,
metrics=['accuracy'])
I try to copy the architecture to caffe. as below
I dont expect the same performance, but should be similar with same parameters for same dataset
prototxt:
name: ""
layer {
name: "test_from_keras"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/images/train/img_train_lmdb"
batch_size: 6
backend: LMDB
}
}
layer {
name: "test_from_keras"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/images/test/img_train_lmdb"
batch_size: 6
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
kernel_size: 3
stride: 1
weight_filler {
type: "uniform"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
kernel_size: 3
stride: 1
weight_filler {
type: "uniform"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "uniform"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "flatdata"
type: "Flatten"
bottom: "pool3"
top: "flatdata"
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "flatdata"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 128
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
top: "accuracies"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
solver:
................
base_lr: 0.01
momentum: 0.0
weight_decay: 0.0005
# The learning rate policy
#lr_policy: "inv"
lr_policy: "poly"
gamma: 0.0001
power: 0.75
type: "RMSProp"
rms_decay: 0.98
......................
and I get result:
I1117 14:55:46.197170 5362 sgd_solver.cpp:106] Iteration 595, lr = 0.0957578
I1117 14:55:46.530448 5362 solver.cpp:228] Iteration 596, loss = 0.716614
I1117 14:55:46.530511 5362 solver.cpp:244] Train net output #0: loss = 0.716614 (* 1 = 0.716614 loss)
I1117 14:55:46.530525 5362 sgd_solver.cpp:106] Iteration 596, lr = 0.095751
I1117 14:55:46.875291 5362 solver.cpp:228] Iteration 597, loss = 0.76887
I1117 14:55:46.875366 5362 solver.cpp:244] Train net output #0: loss = 0.76887 (* 1 = 0.76887 loss)
I1117 14:55:46.875385 5362 sgd_solver.cpp:106] Iteration 597, lr = 0.0957443
I1117 14:55:47.215499 5362 solver.cpp:228] Iteration 598, loss = 0.733329
I1117 14:55:47.215590 5362 solver.cpp:244] Train net output #0: loss = 0.733329 (* 1 = 0.733329 loss)
I1117 14:55:47.215610 5362 sgd_solver.cpp:106] Iteration 598, lr = 0.0957375
I1117 14:55:47.551973 5362 solver.cpp:228] Iteration 599, loss = 0.751441
I1117 14:55:47.552047 5362 solver.cpp:244] Train net output #0: loss = 0.75144 (* 1 = 0.75144 loss)
......................
I1117 14:55:48.606860 5362 solver.cpp:404] Test net output #0: accuracies = 0
I1117 14:55:48.606914 5362 solver.cpp:404] Test net output #1: accuracies = 0.9
I1117 14:55:48.606925 5362 solver.cpp:404] Test net output #2: accuracy = 0.5
I1117 14:55:48.606937 5362 solver.cpp:404] Test net output #3: loss = 0.697836 (* 1 = 0.697836 loss)
what did I do wrong?
As I know now, caffe dont support categorical_crossentropy.
How can I fix it?
Thanks.