How reduce the size of a caffemodel?

fab.m...@gmail.com

unread,

Nov 23, 2015, 8:41:17 PM11/23/15

to Caffe Users

Hi everyone,

I started to use Caffe few days ago and as you can understand I'm still a beginner.

I've also read many papers, in particular the ones that specify the results with AlexNet, Network In Network, GoogleNet, VGG etc.

I learned that starting from the very first models of AlexNet, which could be 240MB more or less, thanks to Network In Network style (mlp layer) and subsequently with the Inception layer of the GoogleNet, the size of a .caffemodel file has been dramatically reduce (30 MB with NIN and 50 MB with GoogleNet) still having good test performance or even better.

If I'm correct the size of a Caffe model in practice is the sum of various weights of each layer in the net. Of course the number of weights of each layer depends by its type, parameters, and output.

So I still can't properly understand how NIN and GoogleNet can have smaller models than AlexNet for example, knowing that both NIN and GoogleNet have lots more layer than AlexNet.

Is it just beacuse the first two don't have the bigger FC layers that AlexNet have, or maybe is all thanks to the appropriate use of other layers (for example 1x1 conv layers, pooling layers, concat layer, etc.)?

Can someone please explain

The purpose of reducing the size of the .caffemodel file is because in a near future I will hope to use Caffe with an Android app, and of course having a small file will be better with embedded devices with limited memory/resources and maybe it be loaded a little faster.

So if I can understand how to reduce a model (of course without lowering the performance too much) I can build my net and train it with my dataset.

Otherwise finetune a net that produce a small model could be another option to procede. I wanted to test both option to see the results with each one.

Thanks.

Michael Figurnov

unread,

Nov 24, 2015, 12:01:23 AM11/24/15

to Caffe Users

Network size is usually dominated by the fully connected (FC) layers. Let's take a look at AlexNet's FC layers. The first FC layer (fc6) connects an input with 256 channels and 6x6 spatial size to 4096 neurons. Hence, the number of connections (= number of elements in the weights matrix) is 256*6*6*4096 = 37748736. If we multiply this by 4 (size of a float in bytes) and divide by 1024*1024 (bytes in a megabyte), we obtain 144MB. Fc7 connects 4096 neurons to 4096 neurons, and its size is 64MB. Fc8 connects 4096 neurons to 1000 neurons, and is just 15.6 MB. So, in total FC layers take up 223.6MB, almost all of the network's size (caffemodel size is 233MB). So, everything else: all weights and biases of convolutional layers, and biases of FC layers take up just 9.4MB.

So, NIN and GoogleNet are so much smaller than AlexNet and VGG16 mostly due to the lack of FC layers. If you want your model to be small, avoid FC layers.

You can also explore recent research papers on compression of convnets. Deep Compression squeezes AlexNet to just 7MB. Tensorizing of FC layers (full disclosure: paper is from my lab) allows to compress VGG-16 network by a factor of 7, with one of the FC layers compressed by a factor of 200k. However, if I understand correctly, both approaches would require custom code for Caffe implementation.

понедельник, 23 ноября 2015 г., 20:41:17 UTC-5 пользователь fab.m...@gmail.com написал:

fab.m...@gmail.com

unread,

Nov 24, 2015, 7:44:13 AM11/24/15

to Caffe Users

Thanks Michael for the quick and complete responce.

Now I understand better how the size of a Caffe model grow up.

Also I will surely look up at the research papers that you mentioned and hope to find other informations that can help me.

Thanks again!

Alex Orloff

unread,

Jan 19, 2016, 5:45:20 AM1/19/16

to Caffe Users

By the way, if we reduce number of FC layers -> we reduce number of weights at all

Do you know how it affects overfitting issue?

It seems to me that overfitting is slightly possible within convolutional layers, Am I right?

Thanks

вторник, 24 ноября 2015 г., 8:01:23 UTC+3 пользователь Michael Figurnov написал:

Kalkaneus

unread,

Jan 19, 2016, 9:04:24 PM1/19/16

to Caffe Users

@Alex Orloff
The deep compression is done after training all the network. So, we just need to fine tuning after compressing. I think, the overfitting is less likely to happen in this case than compress the network first before training. But I still don't have any valid reason for this

Nex Jeb

unread,

Mar 18, 2016, 7:40:45 AM3/18/16

to Caffe Users

Does compressing Networks lead to less efficient forward pass at test time ?

Reply all

Reply to author

Forward