Caffe model fails to learn

40 views
Skip to first unread message

peterbe...@gmail.com

unread,
Oct 1, 2018, 4:01:10 AM10/1/18
to Caffe Users

I have the following convolutional model implemented in Keras, where after training for 100,000 epoch, it shows excellent performance with greate accuracy.

img_rows, img_cols = 24, 15 input_shape = (img_rows, img_cols, 1) nb_filters = 32 pool_size = (2, 2) kernel_size = (3, 3) model = Sequential() model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='valid', input_shape=input_shape)) model.add(Activation('relu')) model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1])) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=pool_size)) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(nb_classes)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])

However after trying to implement the same model in Caffe, it fails to train with an almost fixed loss value >=2.1 && <=2.6. Here is my Caffe prototext implementation:

name: "FneishNet" layer { name: "inlayer1" type: "Data" top: "data" top: "label" include { phase: TRAIN } data_param { source: "examples/fneishnet_numbers/fneishnet_numbers_train_lmdb" batch_size: 128 backend: LMDB } } layer { name: "inlayer1" type: "Data" top: "data" top: "label" include { phase: TEST } data_param { source: "examples/fneishnet_numbers/fneishnet_numbers_val_lmdb" batch_size: 64 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 kernel_size: 3 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "conv2" type: "Convolution" bottom: "conv1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 kernel_size: 3 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool1" type: "Pooling" bottom: "conv2" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 1 } } layer { name: "drop1" type: "Dropout" bottom: "pool1" top: "pool1" dropout_param { dropout_ratio: 0.25 } } layer { name: "flatten1" type: "Flatten" bottom: "pool1" top: "flatten1" } layer { name: "fc1" type: "InnerProduct" bottom: "flatten1" top: "fc1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 128 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu3" type: "ReLU" bottom: "fc1" top: "fc1" } layer { name: "drop2" type: "Dropout" bottom: "fc1" top: "fc1" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc2" type: "InnerProduct" bottom: "fc1" top: "fc2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 11 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "fc2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc2" bottom: "label" top: "loss" }

And here is my model solver (hyper-parameters):

net: "models/fneishnet_numbers/train_val.prototxt" test_iter: 1000 test_interval: 4000 test_initialization: false display: 40 average_loss: 40 base_lr: 0.01 gamma: 0.1 lr_policy: "poly" power: 0.5 max_iter: 3000000 momentum: 0.9 weight_decay: 0.0005 snapshot: 100000 snapshot_prefix: "models/fneishnet_numbers/fneishnet_numbers_quick" solver_mode: CPU

I believe that if i have no problem translating the model into Caffe, then it should performs the same way it do in Keras, so i think i had missed something. Any help would be appreciated, thanks.

peterbe...@gmail.com

unread,
Oct 2, 2018, 1:15:38 AM10/2/18
to Caffe Users

thanks to @Dusa, who saved me through his answer here.

where is solved my problem through changing my hyper-parameters to fit Adam's optimizer specifications as shown here

Reply all
Reply to author
Forward
0 new messages