reduce network memory usage for forward pass

318 views

Skip to first unread message

Ellery R Russell

unread,

Apr 7, 2015, 8:51:51 PM4/7/15

to caffe...@googlegroups.com

Hi all,

I'm currently working with an embedded device with limited onboard memory. I've already got https://github.com/BVLC/caffe/pull/2009, which helps a little. I'm using a fully convolutional model, and right now the memory seems to peak around 600MB. I'd need that under 450MB so I'm considering the following options and wondering if they are possible/implemented somewhere I can't see.

Using half-precision floating point: this seems like the simplest, and from what I understand it doesn't affect classification error too significantly. I just have no idea how to do it.

Getting rid of lower blob data as the network evaluates: seems terrible, but maybe a good idea nonetheless.

Chopping up my network's convolutional fc6 layer: I'd rather not, but may have to.

Sparse matrices: Something I've been reading about, not sure how hard it is to implement, but it seems like it might reduce the memory but still take quite a while to evaluate.

Low rank approximations: This seems like it would reduce the parameter memory space but not the data or buffer size?

I also can't explain where all the memory is coming from - my calculations don't quite add up to the total usage. The data size is ~113MB, the param size is ~227MB, and I guess there's an col_buffer somewhere that's eating a lot. Is that it though? Because python says it has 616MB of memory just to feed in a single image and do a forward pass. And do the split layers also take memory? Or are they just pointers to the same address?

#disclaimer - I'm not a computer scientist, I studied physics. Hopefully these aren't idiotic questions.

Thanks for your help

Ellery

Lisandro K

unread,

Jul 6, 2016, 3:15:45 AM7/6/16

to Caffe Users

Hi Ellery,

I am also interested in reducing my models size in RAM memory. Have you solved this problem? Has any of the solutions you've been thinking worked for you?

I've found this alternative to "chopping up my networks convolutional fc6 layer": tensorizing neural networks (http://arxiv.org/pdf/1509.06569v2.pdf). Will give it a try and see how it goes