Memory requirements for ResNet-50 finetuning

1,271 views
Skip to first unread message

alkamid

unread,
Sep 8, 2016, 4:40:45 AM9/8/16
to Caffe Users
Hi. I'm trying to finetune the ResNet-50 on my dataset. For the network definition, I'm using a modification of the file from the deepdetect project (https://github.com/beniz/deepdetect/blob/master/templates/caffe/resnet_50/resnet_50.prototxt). I have ~200 training images and ~40 validation, all of which I resized to ResNet's 224x224 and converted into an LMDB using caffe's scripts. I had to change the test layer from MemoryData to LMDB because I was getting the following error and I wasn't able to fix it:

F0812 14:55:50.548180 32664 memory_data_layer.cpp:41] Check failed: data_ MemoryDataLayer needs to be initalized by calling Reset

I also added the following to Conv layers, to prevent them from learning:

    param {
              lr_mult
: 0
              decay_mult
: 0
   
}

I changed the last FC layer to output 2 classes:

layer {
bottom: "pool5"
top: "my-fc1000"
       name: "my-fc1000"
      type: "InnerProduct"
   inner_product_param {
          num_output: 2
          weight_filler {
                  type: "xavier"
               }
              bias_filler {
                    type: "constant"
                       value: 0
             }
      }
}

layer {
bottom: "my-fc1000"
    bottom: "label"
top: "prob"
    name: "prob"
   type: "SoftmaxWithLoss"
include {
        phase: TRAIN
 }
}
layer {
      name: "probt"
      type: "Softmax"
      bottom: "my-fc1000"
      top: "probt"
      include {
          phase: TEST
 }
}

And my final net and solver protofiles are here:

I then downloaded model parametres provided by Kaiming He (https://github.com/KaimingHe/deep-residual-networks) and attempted to run finetuning on a g2.8xlarge AWS instance. It has 4 GPUs, each of them with 4GB of memory. I know that ResNets are particularly memory-consuming, but I thought it should be fine if I only train the final layer, and in the extreme (shown above) case I reduced batch_size to 1. Nevertheless, I'm getting the following error:

$ /home/ubuntu/caffe/build/tools/caffe train -model res50_train_val_mod.prototxt -solver solver_mod.prototxt -weights ResNet-50-model.caffemodel -gpu=all

...
I0702
18:52:15.528509 18257 net.cpp:274] Network initialization done.
I0702
18:52:15.529568 18257 solver.cpp:60] Solver scaffolding done.
I0702
18:52:15.540081 18257 caffe.cpp:129] Finetuning from ResNet-50-model.caffemodel
I0702
18:52:15.711124 18257 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: ResNet-50-model.caffemodel
I0702
18:52:15.711189 18257 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields.
W0702
18:52:15.711201 18257 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.
I0702
18:52:15.733320 18257 net.cpp:752] Ignoring source layer fc1000
I0702
18:52:15.918457 18257 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: ResNet-50-model.caffemodel
I0702
18:52:15.918534 18257 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields.
W0702
18:52:15.918545 18257 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.
I0702
18:52:15.944305 18257 net.cpp:752] Ignoring source layer fc1000
I0702
18:52:15.944381 18257 net.cpp:752] Ignoring source layer prob
I0702
18:52:15.948089 18257 caffe.cpp:219] Starting Optimization
I0702
18:52:15.948125 18257 solver.cpp:279] Solving ResNet-50
I0702
18:52:15.948144 18257 solver.cpp:280] Learning Rate Policy: step
I0702
18:52:15.961205 18257 solver.cpp:337] Iteration 0, Testing net (#0)
I0702
18:52:15.985610 18257 net.cpp:684] Ignoring source layer prob
I0702
18:52:47.178249 18257 solver.cpp:404]     Test net output #0: label = 0.27
I0702
18:52:47.178784 18257 solver.cpp:404]     Test net output #1: label = 0.27
I0702
18:52:47.178804 18257 solver.cpp:404]     Test net output #2: label = 0.25
I0702
18:52:47.178823 18257 solver.cpp:404]     Test net output #3: label = 0.27
I0702
18:52:47.178836 18257 solver.cpp:404]     Test net output #4: label = 0.27
I0702
18:52:47.178848 18257 solver.cpp:404]     Test net output #5: label = 0.26
I0702
18:52:47.178860 18257 solver.cpp:404]     Test net output #6: label = 0.27
I0702
18:52:47.178872 18257 solver.cpp:404]     Test net output #7: label = 0.27
I0702
18:52:47.178882 18257 solver.cpp:404]     Test net output #8: label = 0.26
I0702
18:52:47.178894 18257 solver.cpp:404]     Test net output #9: label = 0.27
I0702
18:52:47.178906 18257 solver.cpp:404]     Test net output #10: label = 0.26
I0702
18:52:47.178918 18257 solver.cpp:404]     Test net output #11: label = 0.26
I0702
18:52:47.178930 18257 solver.cpp:404]     Test net output #12: label = 0.26
I0702
18:52:47.178942 18257 solver.cpp:404]     Test net output #13: label = 0.25
I0702
18:52:47.178969 18257 solver.cpp:404]     Test net output #14: label = 0.27
I0702
18:52:47.178985 18257 solver.cpp:404]     Test net output #15: label = 0.25
I0702
18:52:47.178997 18257 solver.cpp:404]     Test net output #16: probt = 0.441688
I0702
18:52:47.179010 18257 solver.cpp:404]     Test net output #17: probt = 0.558312
I0702
18:52:47.179025 18257 solver.cpp:404]     Test net output #18: probt = 0.444477
I0702
18:52:47.179049 18257 solver.cpp:404]     Test net output #19: probt = 0.555523
I0702
18:52:47.179064 18257 solver.cpp:404]     Test net output #20: probt = 0.441312
I0702
18:52:47.179076 18257 solver.cpp:404]     Test net output #21: probt = 0.558689
I0702
18:52:47.179097 18257 solver.cpp:404]     Test net output #22: probt = 0.430302
I0702
18:52:47.179114 18257 solver.cpp:404]     Test net output #23: probt = 0.569698
I0702
18:52:47.179127 18257 solver.cpp:404]     Test net output #24: probt = 0.436726
I0702
18:52:47.179149 18257 solver.cpp:404]     Test net output #25: probt = 0.563274
I0702
18:52:47.179163 18257 solver.cpp:404]     Test net output #26: probt = 0.434183
I0702
18:52:47.179188 18257 solver.cpp:404]     Test net output #27: probt = 0.565817
I0702
18:52:47.179203 18257 solver.cpp:404]     Test net output #28: probt = 0.434018
I0702
18:52:47.179225 18257 solver.cpp:404]     Test net output #29: probt = 0.565982
I0702
18:52:47.179239 18257 solver.cpp:404]     Test net output #30: probt = 0.434224
I0702
18:52:47.179256 18257 solver.cpp:404]     Test net output #31: probt = 0.565776
I0702
18:52:47.179271 18257 solver.cpp:404]     Test net output #32: probt = 0.431346
I0702
18:52:47.179289 18257 solver.cpp:404]     Test net output #33: probt = 0.568654
I0702
18:52:47.179306 18257 solver.cpp:404]     Test net output #34: probt = 0.42653
I0702
18:52:47.179338 18257 solver.cpp:404]     Test net output #35: probt = 0.57347
I0702
18:52:47.179373 18257 solver.cpp:404]     Test net output #36: probt = 0.447015
I0702
18:52:47.179406 18257 solver.cpp:404]     Test net output #37: probt = 0.552985
I0702
18:52:47.179437 18257 solver.cpp:404]     Test net output #38: probt = 0.424695
I0702
18:52:47.179467 18257 solver.cpp:404]     Test net output #39: probt = 0.575305
I0702
18:52:47.179483 18257 solver.cpp:404]     Test net output #40: probt = 0.436721
I0702
18:52:47.179502 18257 solver.cpp:404]     Test net output #41: probt = 0.563279
I0702
18:52:47.179515 18257 solver.cpp:404]     Test net output #42: probt = 0.434268
I0702
18:52:47.179533 18257 solver.cpp:404]     Test net output #43: probt = 0.565732
I0702
18:52:47.179548 18257 solver.cpp:404]     Test net output #44: probt = 0.435547
I0702
18:52:47.179569 18257 solver.cpp:404]     Test net output #45: probt = 0.564453
I0702
18:52:47.179584 18257 solver.cpp:404]     Test net output #46: probt = 0.427467
I0702
18:52:47.179603 18257 solver.cpp:404]     Test net output #47: probt = 0.572533
F0702
18:52:47.238312 18257 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
   
@     0x7f1c6fbdcdaa  (unknown)
   
@     0x7f1c6fbdcce4  (unknown)
   
@     0x7f1c6fbdc6e6  (unknown)
   
@     0x7f1c6fbdf687  (unknown)
   
@     0x7f1c70343511  caffe::SyncedMemory::to_gpu()
   
@     0x7f1c70342879  caffe::SyncedMemory::mutable_gpu_data()
   
@     0x7f1c70227ff2  caffe::Blob<>::mutable_gpu_data()
   
@     0x7f1c70368ea3  caffe::EltwiseLayer<>::Forward_gpu()
   
@     0x7f1c70239af5  caffe::Net<>::ForwardFromTo()
   
@     0x7f1c70239e67  caffe::Net<>::Forward()
   
@     0x7f1c70349227  caffe::Solver<>::Step()
   
@     0x7f1c70349ae9  caffe::Solver<>::Solve()
   
@           0x40846e  train()
   
@           0x405cbc  main
   
@     0x7f1c6e736ec5  (unknown)
   
@           0x40648d  (unknown)
   
@              (nil)  (unknown)
Aborted (core dumped)

Is it true that 4GB is not enought for finetuning of ResNet-50, or am I simply not initialising my net properly? I'd be very grateful for hints.

康洋

unread,
Sep 12, 2016, 4:12:47 AM9/12/16
to Caffe Users
in my case, I use ResNet-50 in train_net and test_net and totally memory cost is 6G..

alkamid

unread,
Sep 16, 2016, 5:23:11 AM9/16/16
to Caffe Users
But are you freezing the inner layers? I managed to run ResNet-50 via deepdetect and it takes 3GB of memory, so I'm really not sure why pure caffe requires more.

Uday Kusupati

unread,
May 31, 2017, 10:23:19 PM5/31/17
to Caffe Users
I have the same problem too. But running on single gpu gave no error. I think the problem is in running on multiple GPUs
Reply all
Reply to author
Forward
0 new messages