I don't know about Caffe2, but current Caffe does not support any kind of sparsity.
One thing you have to know about processing sparse matrices is that the benefit of using them comes not from the fact that some of their entries are zero. In fact, if you wanted to write a function that processes some matrix, and "only perform the operation on non-zero elements", you would actually lose performance - unless the matrices involved had a huge amount of zeros, the very fact that you check whether the next element is zero or not produces a large overhead. Speed-ups can only be achieved if you can store your matrix in such way that it does not even contain those zeros. One example of such storage method is
CSR (compressed sparse row) format - you basically only store indices and values of the non-zero elements. The problem with those matrices is that all operations involving them have to be written from grounds up, since you can no longer simply iterate over the elements. Caffe does not have the infrastructure to do that, and is very unlikely to ever have it. After all, the basic unit of data flow in Caffe is a
Blob which stores all elements as C-contiguous dense arrays - and all processing is designed for this type of storage.
You can however approach pruning on a neuron level. I have successfully performed surgeries to remove entire slices of convolutional layers, for example extracting only the N most important filters from a layer. It's a pretty simple operation (if you can handle some matrix algebra) and can bring huge speedups (I managed to get below 5ms inference on a network which originally propagated in about 20).