Dynamic fixed point quantization of activations lead to catastrophic results

ngal...@gmail.com

unread,

Jul 11, 2016, 10:16:29 AM7/11/16

to ristretto-users

Hello,

I've run the ristretto tool on a 152 layer deep ResNet, with the following results. It seems that 8 bit weights are sufficient for convolution and fully connected layers. However, dynamic fixed point activations kill accuracy. Is there a logical explanation for this (perhaps it's to be expected?), and if so, is there a way to enable dynamic fixed point for CONV/FC only? I don't seem to find a way to turn dynamic fixed point layer activations on/off individually for the resulting quantized network.

Thanks!

--nico

I0708 18:53:41.303802 70469 quantization.cpp:136] accuracy_top5 = 0.005

I0708 18:53:41.788223 70469 quantization.cpp:259] ------------------------------

I0708 18:53:41.788274 70469 quantization.cpp:260] Network accuracy analysis for

I0708 18:53:41.788278 70469 quantization.cpp:261] Convolutional (CONV) and fully

I0708 18:53:41.788281 70469 quantization.cpp:262] connected (FC) layers.

I0708 18:53:41.788285 70469 quantization.cpp:263] Baseline 32bit float: 0.749002

I0708 18:53:41.788295 70469 quantization.cpp:264] Dynamic fixed point CONV

I0708 18:53:41.788297 70469 quantization.cpp:265] weights:

I0708 18:53:41.788300 70469 quantization.cpp:267] 16bit: 0.749022

I0708 18:53:41.788305 70469 quantization.cpp:267] 8bit: 0.747182

I0708 18:53:41.788308 70469 quantization.cpp:267] 4bit: 0.001

I0708 18:53:41.788312 70469 quantization.cpp:270] Dynamic fixed point FC

I0708 18:53:41.788316 70469 quantization.cpp:271] weights:

I0708 18:53:41.788317 70469 quantization.cpp:273] 16bit: 0.749022

I0708 18:53:41.788321 70469 quantization.cpp:273] 8bit: 0.749422

I0708 18:53:41.788326 70469 quantization.cpp:273] 4bit: 0.686141

I0708 18:53:41.788329 70469 quantization.cpp:275] Dynamic fixed point layer

I0708 18:53:41.788331 70469 quantization.cpp:276] activations:

I0708 18:53:41.788334 70469 quantization.cpp:278] 16bit: 0.0266003

I0708 18:53:41.788338 70469 quantization.cpp:281] Dynamic fixed point net:

I0708 18:53:41.788341 70469 quantization.cpp:282] 8bit CONV weights,

I0708 18:53:41.788344 70469 quantization.cpp:283] 8bit FC weights,

I0708 18:53:41.788347 70469 quantization.cpp:284] 0bit layer activations:

I0708 18:53:41.788349 70469 quantization.cpp:285] Accuracy: 0.001

I0708 18:53:41.788353 70469 quantization.cpp:286] Please fine-tune.

ngal...@gmail.com

unread,

Jul 11, 2016, 11:48:41 AM7/11/16

to ristretto-users

After closer source inspection, I noticed negative numbers to the activation layer integer bit widths in the resulting quantized.prototxt. Re-running with verbose logging turned on, I noticed zero and negative numbers in the initial bit width scan, e.g. like this:

11932:I0708 16:04:55.599607 70469 quantization.cpp:155] Layer res2b_branch2c, integer length input=4, integer length output=4, integer length parameters=-1

For a full list of the integer lengths, see here:

https://gist.github.com/anonymous/219f169524f807a3aebbc716d70f2617

Is this a bug in Ristretto?

ngal...@gmail.com

unread,

Jul 11, 2016, 12:02:02 PM7/11/16

to ristretto-users

And for what it's worth, here is the original .prototxt:

https://gist.github.com/6aef923f66f4a5573c9df0e71ce6fc14

You can use the weights from here: https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777

Philipp Gysel

unread,

Jul 13, 2016, 12:33:50 PM7/13/16

to ristretto-users

Hi Nico,

Thanks for pointing out this problem!

Some initial experiments show ResNet works well with dynamic fixed point parameters and activations for the fully connected layers. However the convolutional layer activations in dynamic fixed point make the accuracy go down drastically. I'm still trying to find the reason for this. I assume the problem is the skip layer in the residual architecture. The quantization errors go directly to the next layers. I'll inform you once I know more.

If you want to simulate quantized parameters, but 32-bit floating point activations, you have to change the source code of the Ristretto tool in src/caffe/ristretto/quantization.cpp. The function Quantize2DynamicFixedPoint(.) chooses the number format (integer and fractional length) for each layer's activations and parameters. Then it finds suitable bit-widths by trial-and-error. EditNetDescriptionDynamicFixedPoint(.) changes the network description to have dynamic fixed point layers. The third function param indicates whether activations, parameters or both should be quantized. If you use commit f2d1f3, you will need to change line 249.

Also, I found a small bug in Ristretto. The tool assumes 16-bit dynamic fixed point always works, which is not the case for ResNet. I will make the necessary changes. That's why the final bit-width chosen for the activations is random. However, this bug does not affect the simulation dynamic fixed point activations (16-bit activations really lead to a net accuracy of 0.0266003).

ngal...@gmail.com

unread,

Jul 19, 2016, 12:13:56 PM7/19/16

to ristretto-users

Hi Philipp,

Thinking about this problem a bit more carefully, I'm wondering if the problem could related to the fact that, in a residual learning building block, the output of the non-quantized path (the skip connection) and the output of the quantized path (a stack of 2 or 3 convolutional layers) are added together.

In terms of the paper, H(x) = F(x) + x, and the quantized version H_q(x) = F_q(x_q) + x. Note that in the last formula, the last term is x, not x_q.

Given that the network was trained on the assumption of x everywhere, perhaps it would work better if we replace x by x_q (in other words, introduce quantization on the skip connection).

Thoughts?

ngal...@gmail.com

unread,

Jul 19, 2016, 1:18:47 PM7/19/16

to ristretto-users

On a related note, I'm wondering if there could also be a bug in the dynamic fixed point code because when testing minifloat quantization, I achieve 67% top1 precision (8.2% accuracy top1 loss on ImageNet).

Theoretically dynamic fixed point with 16 bits should be able to get the same accuracy, if not better, then minifloat FP16, right?

Nico

pmg...@ucdavis.edu

unread,

Jul 22, 2016, 12:22:41 PM7/22/16

to ristretto-users

Hi Nico,

So I performed some more experiments on ResNet-152. I managed to get good results with 16-bit fixed point activations and 8-bit dynamic fixed point parameters. However, I didn't manage to reduce the activations to 8-bit.

1) (Dynamic) fixed point quantization:

Due to the specific architecture of ResNet, Ristretto's normal quantization strategy doesn't work well. Thanks for you to point this out. With the residual architecture, the fist convolutional layer's output "skips" all the way to the last fully connected layer (ignoring pooling, ReLU and batch normalization layers). In fact we add quantization error to each layer output, which add all up toward the top of the network. Maybe that is a good intuition why ResNet-quantization is harder. In other networks like GoogLeNet, the quantization errors of each layer "cancel each other out" to a large degree. In these other networks, each layer output goes through convolutional layers where the quantization error gets "reduced".

I found better quantization results when using static fixed point for activations. That is, I used 16-bit fixed point for activations for convolutional and fully connected layers. All layers share the same number format (integer length, fractional length).

The dynamic range of layer outputs is quite significant. We see some layers where the largest inputs are about 2^10, whereas other layers have their largest value at around 2^3. If we choose layer specific (dynamic) fixed point format, then we would quantize to [integer length, fractional length] = [10, 6] in some layers, and in others we'd use [3, 13]. Now since one layer's output later becomes another layer's input (skip connection), this means we first only leave 6 fractional bits, and then only 3 integer bits, so in the end this is similar to only having 9 bits to represent the number. Maybe that gives an intuition why it's so hard to represent activations with only 8 bits.

So finally, here is the best quantization strategy we found so far:

For parameters of convolutional and fully connected layers, we use 8-bit dynamic fixed point. For the number format, we use the normal strategy (use enough integer bits to cover the large values).

For activations of convolutional and fully connected layers, we use 16-bit fixed point. We choose the same number format for all inputs and outputs: [10, 6].

We quantized ResNet this way and tested on the validation data set. Top-1 results:

Baseline (32-bit floating point): 74.9002%

Fixed point (as described above): 74.5503%

2) Minifloat:

You mention you see an accuracy drop of 8.2% when using half precision. Here Ristretto doesn't do a good job with the number format. You should use 5 exponent bits, instead of the 4 exponent bits that Ristretto chooses. If you do so, you will see pretty much the same accuracy as 32-bit floating point. In the source code, this corresponds to setting exp_bits_ to 5.

3) Quantized skip connections:

You mention in your comments we should pay attention to simulate quantized skip connections. In ResNet, convolutional layers are followed by batch normalization layers, and skip connections add up results of two different normalization layers. If you want to really simulate a hardware accelerator, you might want to simulate quantized normalization layers. For that purpose, you could write your own BatchNormRistretto layer. Beware though, these normalization layers are a little tricky to quantize.

I hope this helps?

Philipp

ngal...@gmail.com

unread,

Jul 22, 2016, 3:18:40 PM7/22/16

to ristretto-users

Hello Philipp,

Thank you for this detailed analysis - this is very helpful. I will try to reproduce your results soon. Touching on your comments, I have some follow-up thoughts:

(Dynamic) fixed point quantization

I buy your argument that low precision behavior is different for ResNets due to the fact that the quantization error skips directly to subsequent layers. This is in line with Pete Warden's thesis that we've trained the convolutional layers to handle noise in the inputs. We can think of quantization as a form of noise. However, for ResNets, the problem is indeed that some of the "noise" (quantization error) is being fed (almost) straight to the top. As a small side note though, I believe that the output of the first convolutional layer doesn't skip all the way to the top, there are a couple CONV layers in between still (e.g. res{2,3,4}a_branch1) but overall your comment holds.

Regarding dynamic vs static fixed point: In case we're combining [10, 6] and [3, 13] type numbers, why does that mean we end up with only 9 bits effectively? While that might be true in a hardware implementation that only has 16-bit width ALUs, as far as I can see from the Ristretto code, all computations are being done in FP32. Can you explain your argument a bit further?
Minifloat: I'm assuming that there is overflow due to 4 exponent bits (vs 5). Why didn't the data analysis in Quantization::Quantize2MiniFloat() detect this, and set it to 5 automatically?
Quantized skip connections: Can you elaborate on the challenges with quantizing normalization layers? Would you say that the implementation in tensorflow (here) is adequate?

As a last note, on the fact that Ristretto does all computations in usual floating point arithmetic: would you say that the accuracy analysis calculated with Ristretto layers is a good predictor for the accuracy one would get with an implementation that implements low precision arithmetic the way it would happen in hardware?

Thanks!

--nico

pmg...@ucdavis.edu

unread,

Jul 26, 2016, 1:48:17 PM7/26/16

to ristretto-users

Hi Nico,

1) You're correct, we actually simulate 16-bit ALUs in this experiment. My comment with the 9 bits was probably not helpful, please ignore it.

While we see good 8-bit dynamic fixed point results for AlexNet, SqueezeNet and GoogleNet, ResNet seems to be a little more challenging. I already mentioned I didn't manage to trim ResNet activations to 8-bit, but used 16-bit instead.

2) You ask why Ristretto used 4 exponent bits. Why did Ristretto not detect it should use 5 exponent bits for minifloat representation? As mentioned in another question, Ristretto only considers one image batch to find the maximal value of each layer's activations. It happens to be that this was not enough for your case, Ristretto should look at multiple batches to get a more accurate statistic. I will fix this in the code.

3) Quantized normalization layers: We originally did some experiments on LRN layers of AlexNet. We saw that the dynamic range of values is very large. So the ratio between the largest and smallest intermediate result can be in the order of 2^20 or more. If you choose one fixed point format for all arithmetic operations, you will incur high quantization errors. (A good definition of LRN can be found in this paper, section 3.3).

That said, we didn't explore batch normalization in detail. You already mentioned TensorFlow which does have quantized batch norm. Hopefully that will serve you well. Currently I don't plan to add a quantized normalization layer to Ristretto.

4) Accurate simulation of hardware arithmetic: You ask how accurate Ristretto simulates low precision arithmetic. In short, this depends on whether your hardware accelerator would work the way Ristretto assumes it does.

So what Ristretto simulates is reduced word width activations as well as reduced word width parameters. Let's consider 8-bit dynamic fixed point, as an example, and only consider convolutional and fully connected layers. In this case, Ristretto simulates multipliers with two 8-bit inputs (layer input and weights) and 16-bit outputs. These 16-bit results are accumulated in 32-bit fixed point. In the end the layer outputs are again quantized to 8-bit fixed point.

Ristretto performs 3 steps for this simulation. First it gets the (original 32-bit floating point) layer inputs and weights. It quantizes those to 8-bit dynamic fixed point (using Trim2FixedPoint_cpu and Trim2FixedPoint_gpu). The results are converted back to 32-bit floating point. Second, those numbers serve as input to the normal layer computation. Since this consists of multiplication-accumulation, we simulate the low precision arithmetic described in the above paragraph. Third and finally, the layer outputs are quantized (again using Trim2FixedPoint_xxx).

One might argue that single precision numbers only have 23-bit mantissa bits, not 32. This means 32-bit floating point numbers might incur more quantization errors than 32-bit fixed point during accumulation. In this sense, the actual low precision arithmetic might perform slightly better than what Ristretto simulates. If you use 64-bit floating point (Net<double> ...) you can avoid uncertainty.

Ristretto uses round-nearest in the test phase. So your hardware accelerator would need to round the original weights to 8-bit using the same rounding scheme. Since we assume 32-bit accumulation, we have enough bits to use for rounding (those bits which are "cut off").

I hope this helps,

Philipp

pmg...@ucdavis.edu

unread,

Jul 27, 2016, 11:54:58 AM7/27/16

to ristretto-users

Hi Nico,

I updated Ristretto on Github (commit 44298cf): Ristretto no longer assumes 16-bit quantization works fine. Instead, it uses 32-bit if 16-bit is not enough.

Thanks for mentioning this problem.

saumya...@alumni.iiit.ac.in

unread,

Jul 3, 2017, 2:34:19 AM7/3/17

to ristretto-users

Hi ,

Was anyone able to quantize the activations to 8-bits using dynamic fixed point? Could you please guide me you achieved it?

Thanks,

Saumya

pmg...@ucdavis.edu

unread,

Jul 5, 2017, 4:48:11 PM7/5/17

to ristretto-users

Hi Saumya, that's a great question. As you probably realized, Ristretto is unable to reduce the bit-width to 8-bit for this particular network. There are 3 improvements required for better ResNet quantization. On the Ristretto webpage here, the second, third and fourth point under "Limitations" explain those three required improvements.

Best,

Philipp

fzhu...@gmail.com

unread,

May 10, 2018, 1:50:09 PM5/10/18

to ristretto-users

Hi Nico:

I am running ristretto with resnet_50.

but I have always get accuracy 0.

I am wondering if I can get some help from you?

Thanks and best regards

Frank

fzhu...@gmail.com

unread,

May 11, 2018, 2:00:15 PM5/11/18

to ristretto-users

Hi All:

This is for resnet50:

fixed above problem but accurary is very low comparing with squeznet.

I0511 10:54:30.651784 26931 quantization.cpp:141] Loss: 0

I0511 10:54:30.651790 26931 quantization.cpp:153] accuracy = 0.0590808

I0511 10:54:30.651796 26931 quantization.cpp:153] accuracy_top5 = 0.1575

I0511 10:54:31.198470 26931 quantization.cpp:276] ------------------------------

I0511 10:54:31.198493 26931 quantization.cpp:277] Network accuracy analysis for

I0511 10:54:31.198511 26931 quantization.cpp:278] Convolutional (CONV) and fully

I0511 10:54:31.198514 26931 quantization.cpp:279] connected (FC) layers.

I0511 10:54:31.198516 26931 quantization.cpp:280] Baseline 32bit float: 0.728762

I0511 10:54:31.198525 26931 quantization.cpp:281] Dynamic fixed point CONV

I0511 10:54:31.198527 26931 quantization.cpp:282] weights:

I0511 10:54:31.198530 26931 quantization.cpp:284] 16bit: 0.728762

I0511 10:54:31.198534 26931 quantization.cpp:284] 8bit: 0.723982

I0511 10:54:31.198536 26931 quantization.cpp:284] 4bit: 0.001

I0511 10:54:31.198540 26931 quantization.cpp:287] Dynamic fixed point FC

I0511 10:54:31.198559 26931 quantization.cpp:288] weights:

I0511 10:54:31.198562 26931 quantization.cpp:290] 16bit: 0.728762

I0511 10:54:31.198565 26931 quantization.cpp:290] 8bit: 0.728482

The result for squzzenet:

I0511 10:09:32.217002 20424 quantization.cpp:153] accuracy = 0.552119

I0511 10:09:32.217010 20424 quantization.cpp:153] accuracy_top5 = 0.783043

I0511 10:09:32.547478 20424 quantization.cpp:276] ------------------------------

I0511 10:09:32.547504 20424 quantization.cpp:277] Network accuracy analysis for

I0511 10:09:32.547523 20424 quantization.cpp:278] Convolutional (CONV) and fully

I0511 10:09:32.547528 20424 quantization.cpp:279] connected (FC) layers.

I0511 10:09:32.547554 20424 quantization.cpp:280] Baseline 32bit float: 0.576799

I0511 10:09:32.547581 20424 quantization.cpp:281] Dynamic fixed point CONV

I0511 10:09:32.547585 20424 quantization.cpp:282] weights:

I0511 10:09:32.547606 20424 quantization.cpp:284] 16bit: 0.55716

I0511 10:09:32.547612 20424 quantization.cpp:284] 8bit: 0.55596

I0511 10:09:32.547634 20424 quantization.cpp:284] 4bit: 0.00568

I0511 10:09:32.547641 20424 quantization.cpp:287] Dynamic fixed point FC

I0511 10:09:32.547646 20424 quantization.cpp:288] weights:

I0511 10:09:32.547650 20424 quantization.cpp:290] 16bit: 0.576799

I0511 10:09:32.547655 20424 quantization.cpp:290] 8bit: 0.576799

I0511 10:09:32.547662 20424 quantization.cpp:290] 4bit: 0.576799

I0511 10:09:32.547667 20424 quantization.cpp:290] 2bit: 0.576799

I0511 10:09:32.547672 20424 quantization.cpp:290] 1bit: 0.576799

I0511 10:09:32.547678 20424 quantization.cpp:292] Dynamic fixed point layer

I0511 10:09:32.547683 20424 quantization.cpp:293] activations:

I0511 10:09:32.547688 20424 quantization.cpp:295] 16bit: 0.575799

I0511 10:09:32.547693 20424 quantization.cpp:295] 8bit: 0.57032

I0511 10:09:32.547698 20424 quantization.cpp:295] 4bit: 0.0264003

I0511 10:09:32.547704 20424 quantization.cpp:298] Dynamic fixed point net:

I0511 10:09:32.547709 20424 quantization.cpp:299] 8bit CONV weights,

I0511 10:09:32.547714 20424 quantization.cpp:300] 1bit FC weights,

I0511 10:09:32.547719 20424 quantization.cpp:301] 8bit layer activations:

I0511 10:09:32.547724 20424 quantization.cpp:302] Accuracy: 0.552119

I0511 10:09:32.547729 20424 quantization.cpp:303] Please fine-tune.

Any ideas?

Frank

fzhu...@gmail.com

unread,

May 12, 2018, 2:54:45 PM5/12/18

to ristretto-users

Hi Philipp:

I saw your comments that you can achieve 74.5503% with fixed points? Could you share your experience or steps on how to do it?

Thanks

Frank

pmg...@ucdavis.edu

unread,

May 18, 2018, 7:25:49 AM5/18/18

to ristretto-users

Hi Frank,

Thanks for you interest in the Ristretto project. Ristretto doesn't do a very good job when quantizing ResNet. In my post from 7/5/17 in this thread, I mention different ways how Ristretto can be improved to get better results for ResNet. If you follow these steps, you can get significantly better results, but you'll have to make multiple changes to the source code.

As for your question about the 74.5503% fixed points accuracy: Ristretto doesn't always find a good dynamic fixed point format for activations of ResNet. So as an experiment, I hardcoded the number format to 16-bit dynamic fixed point for activations in all layers. I used 10 integer bits and 6 fractional bits in all layers. For this I had to change the source code in src/caffe/ristretto/quantization.cpp, line 164. Here il_in_ are the number of integer bits for input activations and il_out_ are the integer bits for output activations.

Best,

Philipp

fzhu...@gmail.com

unread,

May 21, 2018, 11:48:16 AM5/21/18

to ristretto-users

Hi Philipp:

Thank you very much.

After changed in_place blobs of resnet50, accuracy can be improved to accuracy = 0.722262.

PSB

I0518 22:49:41.893957 29549 quantization.cpp:153] accuracy = 0.722262

I0518 22:49:41.893962 29549 quantization.cpp:153] accuracy_top5 = 0.907823

I0518 22:49:42.209137 29549 quantization.cpp:276] ------------------------------

I0518 22:49:42.209158 29549 quantization.cpp:277] Network accuracy analysis for

I0518 22:49:42.209162 29549 quantization.cpp:278] Convolutional (CONV) and fully

I0518 22:49:42.209164 29549 quantization.cpp:279] connected (FC) layers.

I0518 22:49:42.209167 29549 quantization.cpp:280] Baseline 32bit float: 0.728762

I0518 22:49:42.209174 29549 quantization.cpp:281] Dynamic fixed point CONV

I0518 22:49:42.209177 29549 quantization.cpp:282] weights:

I0518 22:49:42.209179 29549 quantization.cpp:284] 16bit: 0.728762

I0518 22:49:42.209182 29549 quantization.cpp:284] 8bit: 0.723982

I0518 22:49:42.209185 29549 quantization.cpp:284] 4bit: 0.001

I0518 22:49:42.209189 29549 quantization.cpp:287] Dynamic fixed point FC

I0518 22:49:42.209193 29549 quantization.cpp:288] weights:

I0518 22:49:42.209195 29549 quantization.cpp:290] 16bit: 0.728762

I0518 22:49:42.209198 29549 quantization.cpp:290] 8bit: 0.728462

I0518 22:49:42.209201 29549 quantization.cpp:290] 4bit: 0.662362

I0518 22:49:42.209204 29549 quantization.cpp:292] Dynamic fixed point layer

I0518 22:49:42.209206 29549 quantization.cpp:293] activations:

I0518 22:49:42.209209 29549 quantization.cpp:295] 16bit: 0.726762

I0518 22:49:42.209211 29549 quantization.cpp:295] 8bit: 0.00054

I0518 22:49:42.209215 29549 quantization.cpp:298] Dynamic fixed point net:

I0518 22:49:42.209218 29549 quantization.cpp:299] 8bit CONV weights,

I0518 22:49:42.209235 29549 quantization.cpp:300] 8bit FC weights,

I0518 22:49:42.209237 29549 quantization.cpp:301] 16bit layer activations:

I0518 22:49:42.209240 29549 quantization.cpp:302] Accuracy: 0.722262

I0518 22:49:42.209244 29549 quantization.cpp:303] Please fine-tune.

But fixed point seems not working.

after changed line 164 to

il_in_.push_back(10);

il_out_.push_back(10);

il_params_.push_back((int)ceil(log2(max_params_[i])+1));

liu.x...@gmail.com

unread,

Sep 19, 2018, 11:31:08 PM9/19/18

to ristretto-users

Hello, Frank,

Where is the in_place blobs, and how to modify it ? thanks a lot.

leo

liu.x...@gmail.com

unread,

Oct 22, 2018, 4:57:27 AM10/22/18

to ristretto-users

hi, Frank,

From accuracy=0 to accuracy =0.5+, what problems do u solve and debug to reach this result?

Thanks a lot

Leo

liu.x...@gmail.com

unread,

Oct 22, 2018, 9:58:38 PM10/22/18

to ristretto-users

Hello, Philipp,

I have reviewed this talk and made several modification in source code, I saw that someone have get good performance :

1. changed line 164 to

il_in_.push_back(10);

il_out_.push_back(10);

il_params_.push_back((int)ceil(log2(max_params_[i])+1));

2. changed in place blobs (in train_val.prototxt, change bottom name and top name different!)

T_DF(FY8SQ)3MYQ0Y8HL017.png

however, after this, the accuracy rate are still bad(especially for net accuracy ). What is the problem?

VGLK9{W(11CKX6W0A@YRF{Y.png

Thanks a lot!

Leo

Reply all

Reply to author

Forward

Message has been deleted