Benchmarking inference speed w/Squeezenet models and w/Ristretto-quantized SqueezeNet models for CPU (not GPU)

auro.t...@gmail.com

unread,

Jan 16, 2017, 2:13:56 AM1/16/17

to ristretto-users

Hi all,

The reason I ask this question is, I see that for CPU-mode, the Ristretto-quantized SqueezeNet models takes much longer than the 'regular' SqueezeNet model (116.5676s for 50 images for Ristretto-quantized SqueezeNet and 23.2358s for 50 images for 'regular' SqueezeNet mode)

In GPU-mode, the benchmark looks a lot better (0.3671s for 50 images for Ristretto-quantized SqueezeNet and 1.9696s for 50 images for 'regular' SqueezeNet model).

So I'm asking myself, how's this possible unless I introduced a bug.

If anyone else has benchmarked the CPU- and GPU-mode, kindly post.

Thank you.
Auro

I'm on a high-end Xeon box and the benchmarks include the image resizing time and the 'transformer' processing time as well.
I'm on Linux 14.04 and a Titan X GPU card.

pmg...@ucdavis.edu

unread,

Jan 20, 2017, 3:53:49 PM1/20/17

to ristretto-users

Hi Auro,

Ristretto simulates fixed point models using floating point math. So you can't get a speedup with Ristretto. Under the hood, Ristretto layers quantize their inputs, do the normal forward path in floating point, and finally quantize the output values. If you are looking for a speedup from using 8-bit fixed point networks, TensorFlow might help:

https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/

Best,

Philipp

Wang Bo

unread,

Jun 16, 2017, 2:31:32 AM6/16/17

to ristretto-users

Hi Philipp,

I want to use Ristretto to speed up the squeezeNet model on mobile platform. As you have said, Ristretto can't help me with a speedup. So, I wander what is the purpose of Ristretto? Can I get a speedup on mobile platforms?

pmg...@ucdavis.edu

unread,

Jun 19, 2017, 3:49:31 PM6/19/17

to ristretto-users

Hi Wang,

Thanks for the question. That's correct, Ristretto does not lead to a direct speedup on mobile devices. Instead, Ristretto can help a developer to find a good trade-off between bit-width reduction and network accuracy. Ristretto is optimized for workstations with a GPU, and it can simulate fixed point networks in a quick turnaround time. Ristretto can generate a prototxt for you which contains the fixed point formats for parameters and activations.

As for your mobile device, you'll want to use actual fixed point arithmetic (for example 8-bit weights and activations). You will have to either find a suitable library which makes it easy for you to deploy your network on your specific target. Or you can write a runtime yourself and optimize it for your specific use case. What accelerator do you have on your mobile device, if I may ask?

I hope that helps,

Philipp

linpen...@gmail.com

unread,

Nov 9, 2018, 3:10:06 AM11/9/18

to ristretto-users

I ran the make all, make test and make runtest, all exited perfectly. I tried going through the SqueezeNet example, but I get stuck on running the "00_quantize_squeezenet.sh". The erreo is as follows.

im2col.cu:61: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch.

The same error exists when I run the "00_quantize_lenet.sh". The GPU I used is GTX1080. The details error is below.

[im2col.cu:61] Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

*** Check failure stack trace: ***

@ 0x7f2755de35cd google::LogMessage::Fail()

@ 0x7f2755de5433 google::LogMessage::SendToLog()

@ 0x7f2755de315b google::LogMessage::Flush()

@ 0x7f2755de5e1e google::LogMessageFatal::~LogMessageFatal()

@ 0x7f2756518268 caffe::im2col_gpu<>()

@ 0x7f27563cda05 caffe::BaseConvolutionLayer<>::conv_im2col_gpu()

@ 0x7f27563ccf8b caffe::BaseConvolutionLayer<>::forward_gpu_gemm()

@ 0x7f27565633a6 caffe::ConvolutionRistrettoLayer<>::Forward_gpu()

@ 0x7f27563e041c caffe::Layer<>::Forward()

@ 0x7f27564b0b77 caffe::Net<>::ForwardFromTo()

@ 0x7f27564b07df caffe::Net<>::Forward()

@ 0x7f27564b0e32 caffe::Net<>::Forward()

@ 0x7f27564ea1a1 Quantization::RunForwardBatches()

@ 0x7f27564eadef Quantization::Quantize2DynamicFixedPoint()

@ 0x7f27564e995e Quantization::QuantizeNet()

@ 0x403ef0 quantize()

@ 0x4041bd main

@ 0x7f2755071830 __libc_start_main

@ 0x403779 _start

@ (nil) (unknown)

Aborted (core dumped)

md.amir.s...@gmail.com

unread,

Nov 9, 2018, 3:29:38 AM11/9/18

to ristretto-users

If you are using this shell script (https://github.com/pmgysel/caffe/blob/master/examples/ristretto/00_quantize_squeezenet.sh)
Then remove '--gpu=0' from the shell script. Hope it solves.

linpen...@gmail.com

unread,

Nov 10, 2018, 2:31:25 AM11/10/18

to ristretto-users

Thank you very much for your advice.

Everything runs OK when I remove '--gpu=0' from the shell script, actually it used CPU for accelerating the quantization process. So it is too slow when I quantize other lager CNN models, such as AlexNet what I am used.

If I want to use GPU for my project, do you have anything else ways to solve the error?

BR.

Reply all

Reply to author

Forward