Questions about pruning, sparsity and compression

40 views

Skip to first unread message

Denis

unread,

Aug 1, 2017, 4:33:20 AM8/1/17

to Caffe Users

Hello,

I have some questions concerning the terms employed in the title of this post. I am not sure to understand the meaning of these terms, so please correct me if I am wrong.

I have a CNN trained with the current Caffe library.
In this net, I can note that a lot of weights are close to zero and if we set them to zero, the output of the net is similar.

Network pruning consists in removing connections with very small weights because these weights have only a very limited influence on the output.
By removing these connections, the size of the net on the disk (and in the memory), ie the number of weights to store, can be reduced. This is network compression.
The computation time can also be reduced. For example, a 3x3 convolution with half weights removed could be computed faster because some product / sum operations can be avoided.
This is called sparse operations because only the part of non zero weights are used.

Did I understand these concepts correctly ?
Now, in practice, I didn't find any simple way to apply these optimizations.

1- in the current Caffe 1 version, if I just set small weights to 0, does caffe automatically take advantage of this in computation and storage ?

2- does exist some special layers that store / use only the usefull weights instead of the whole weights ?

3- is there a tool to create a sparse / pruned model from a dense / non pruned model ? Is there a special training procedure proposed in Caffe ?

4- does the new Caffe 2 library propose a sparse training / testing ? Where can I find the procedure to achieve this ?

5- did you manage to apply pruning on a net, and what is the order or computation time savings ?

Thanks a lot for your help,
Denis

Przemek D

unread,

Jan 18, 2018, 10:05:06 AM1/18/18

to Caffe Users

I don't know about Caffe2, but current Caffe does not support any kind of sparsity.

One thing you have to know about processing sparse matrices is that the benefit of using them comes not from the fact that some of their entries are zero. In fact, if you wanted to write a function that processes some matrix, and "only perform the operation on non-zero elements", you would actually lose performance - unless the matrices involved had a huge amount of zeros, the very fact that you check whether the next element is zero or not produces a large overhead. Speed-ups can only be achieved if you can store your matrix in such way that it does not even contain those zeros. One example of such storage method is CSR (compressed sparse row) format - you basically only store indices and values of the non-zero elements. The problem with those matrices is that all operations involving them have to be written from grounds up, since you can no longer simply iterate over the elements. Caffe does not have the infrastructure to do that, and is very unlikely to ever have it. After all, the basic unit of data flow in Caffe is a Blob which stores all elements as C-contiguous dense arrays - and all processing is designed for this type of storage.

You can however approach pruning on a neuron level. I have successfully performed surgeries to remove entire slices of convolutional layers, for example extracting only the N most important filters from a layer. It's a pretty simple operation (if you can handle some matrix algebra) and can bring huge speedups (I managed to get below 5ms inference on a network which originally propagated in about 20).

Reply all

Reply to author

Forward

0 new messages