Possible ways to achieve run-time improvements during deployment?

Kanguru

unread,

Oct 11, 2017, 4:39:21 PM10/11/17

to Caffe Users

Hey everyone,

I am currently trying to deploy a semantig segmentaion DL net (U-net) for semantic segmentation and want to know if there are possible ways to get run-time improvements during deployment?

What I've done so far is cleaning up the Net.prototxt so I only have all layers i need for deployment. I also have maximized input tile size to what can be maximally processed per pass by the GPU (Geforce GTX 970).
Since the images i want to segment are quite large, I still need a total of 90 seconds total including I/O and stitching for 8000x8000 output (196 passes through the Net). I am assigning to and reading from network using

net.blobs['data'].data[...] = data
net.forward()
output = net.blobs['argmax'].data[...]

in a loop.
Apart from switching to C++, installing fb-caffe-exts to further increase input tile size) as well as upgrading the GPU (I get around 50 seconds for GeForce Titan X), I am out of ideas on what to do.

Any help or tips are much appreciated! :)

Przemek D

unread,

Oct 12, 2017, 3:45:20 AM10/12/17

to Caffe Users

U-net is based on VGG which is very memory and computation-time consuming. I've achieved considerable speedups by removing the innermost layers, but my task was relatively simple and didn't need the whole power (and capacity) of VGG, so this didn't cost me too much accuracy.

Kanguru

unread,

Oct 12, 2017, 5:37:43 AM10/12/17

to Caffe Users

Ah okay thanks, guess I will look into adjusting model complexity to the given problem! :)

Jonathan R. Williford

unread,

Oct 12, 2017, 6:19:27 AM10/12/17

to Kanguru, Caffe Users

I suspect that adopting some of the architecture of ResNext would be helpful:

https://arxiv.org/abs/1611.05431

In the paper, they keep the amount of processing constant but increase the model accuracy. By using a smaller "cardinality" (as defined by the authors), you could keep your model accuracy the same and reduce the complexity.

You might also want to look into pruning and compression algorithms. I don't know which methods are best, but here are some links to get some idea of this topic:

https://arxiv.org/abs/1705.07356

https://jacobgil.github.io/deeplearning/pruning-deep-learning

You might also want to ask on one of the SE sites. Perhaps: https://stats.stackexchange.com for ideas on modifying your model. If you decide to do this, please post the URL here, I would be interested in seeing other people's answers.

I hope this helps!

Jonathan

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users+unsubscribe@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/70481ff3-e198-418f-bddf-06751c64d48d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward