Benchmarking inference speed w/Squeezenet models and w/Ristretto-quantized SqueezeNet models for CPU (not GPU)

446 views
Skip to first unread message

auro.t...@gmail.com

unread,
Jan 16, 2017, 2:13:56 AM1/16/17
to ristretto-users
Hi all,

The reason I ask this question is,  I see that for CPU-mode, the Ristretto-quantized SqueezeNet models takes much longer than the 'regular' SqueezeNet model (116.5676s for 50 images for Ristretto-quantized SqueezeNet and 23.2358s for 50 images for 'regular' SqueezeNet mode)

In GPU-mode, the benchmark looks a lot better (0.3671s for 50 images for Ristretto-quantized SqueezeNet and 1.9696s for 50 images for 'regular' SqueezeNet model).

So I'm asking myself, how's this possible unless I introduced a bug.

If anyone else has benchmarked the CPU- and GPU-mode, kindly post.

Thank you.
Auro

I'm on a high-end Xeon box and the benchmarks include the image resizing time and the 'transformer' processing time as well.
I'm on Linux 14.04 and a Titan X GPU card.

pmg...@ucdavis.edu

unread,
Jan 20, 2017, 3:53:49 PM1/20/17
to ristretto-users
Hi Auro,

Ristretto simulates fixed point models using floating point math. So you can't get a speedup with Ristretto. Under the hood, Ristretto layers quantize their inputs, do the normal forward path in floating point, and finally quantize the output values. If you are looking for a speedup from using 8-bit fixed point networks, TensorFlow might help:

Best,
Philipp

Wang Bo

unread,
Jun 16, 2017, 2:31:32 AM6/16/17
to ristretto-users
Hi Philipp,
    I want to use Ristretto to speed up the squeezeNet model on mobile platform. As you have said, Ristretto can't help me with a speedup. So, I wander what is the purpose of Ristretto? Can I get a speedup on mobile platforms?

pmg...@ucdavis.edu

unread,
Jun 19, 2017, 3:49:31 PM6/19/17
to ristretto-users
Hi Wang,

Thanks for the question. That's correct, Ristretto does not lead to a direct speedup on mobile devices. Instead, Ristretto can help a developer to find a good trade-off between bit-width reduction and network accuracy. Ristretto is optimized for workstations with a GPU, and it can simulate fixed point networks in a quick turnaround time. Ristretto can generate a prototxt for you which contains the fixed point formats for parameters and activations.

As for your mobile device, you'll want to use actual fixed point arithmetic (for example 8-bit weights and activations). You will have to either find a suitable library which makes it easy for you to deploy your network on your specific target. Or you can write a runtime yourself and optimize it for your specific use case. What accelerator do you have on your mobile device, if I may ask?

I hope that helps,
Philipp

linpen...@gmail.com

unread,
Nov 9, 2018, 3:10:06 AM11/9/18
to ristretto-users
I ran the make all, make test and make runtest, all exited perfectly. I tried going through the SqueezeNet example, but I get stuck on running the "00_quantize_squeezenet.sh". The erreo is as follows.

 im2col.cu:61: Check failed: error == cudaSuccess (7 vs. 0)  too many resources requested for launch.

The same error exists when I run the "00_quantize_lenet.sh". The GPU I used is GTX1080. The details error is below.

 [im2col.cu:61] Check failed: error == cudaSuccess (7 vs. 0)  too many resources requested for launch
*** Check failure stack trace: ***
    @     0x7f2755de35cd  google::LogMessage::Fail()
    @     0x7f2755de5433  google::LogMessage::SendToLog()
    @     0x7f2755de315b  google::LogMessage::Flush()
    @     0x7f2755de5e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f2756518268  caffe::im2col_gpu<>()
    @     0x7f27563cda05  caffe::BaseConvolutionLayer<>::conv_im2col_gpu()
    @     0x7f27563ccf8b  caffe::BaseConvolutionLayer<>::forward_gpu_gemm()
    @     0x7f27565633a6  caffe::ConvolutionRistrettoLayer<>::Forward_gpu()
    @     0x7f27563e041c  caffe::Layer<>::Forward()
    @     0x7f27564b0b77  caffe::Net<>::ForwardFromTo()
    @     0x7f27564b07df  caffe::Net<>::Forward()
    @     0x7f27564b0e32  caffe::Net<>::Forward()
    @     0x7f27564ea1a1  Quantization::RunForwardBatches()
    @     0x7f27564eadef  Quantization::Quantize2DynamicFixedPoint()
    @     0x7f27564e995e  Quantization::QuantizeNet()
    @           0x403ef0  quantize()
    @           0x4041bd  main
    @     0x7f2755071830  __libc_start_main
    @           0x403779  _start
    @              (nil)  (unknown)
Aborted (core dumped)

md.amir.s...@gmail.com

unread,
Nov 9, 2018, 3:29:38 AM11/9/18
to ristretto-users
If you are using this shell script (https://github.com/pmgysel/caffe/blob/master/examples/ristretto/00_quantize_squeezenet.sh)
Then remove '--gpu=0' from the shell script. Hope it solves.

linpen...@gmail.com

unread,
Nov 10, 2018, 2:31:25 AM11/10/18
to ristretto-users
Thank you very much for your advice. 

Everything runs OK when I  remove '--gpu=0' from the shell script, actually it used CPU for accelerating the quantization process. So it is too slow when I quantize other lager CNN models,  such as AlexNet what  I am used.

If I want to use GPU for my project, do you have anything else ways to solve the error?

BR.
Reply all
Reply to author
Forward
0 new messages