Save caffemodel containing the quantized weights

1,466 views
Skip to first unread message

A

unread,
Oct 20, 2016, 10:48:09 AM10/20/16
to ristretto-users
Hi,

From what I understand it seems that the caffemodel that is being saved after finetuning on a quantized prototxt net definition still stores the values in 32-bit floats.

Is it possible to store a model fit for deployment using 8-bit dynamic fixed point? (I.e caffemodel that is roughly 1/4th of the original, in terms of size)

Best regards

Philipp Gysel

unread,
Oct 24, 2016, 1:48:29 PM10/24/16
to ristretto-users
Hi there,

Unfortunately, Ristretto does not support this feature. However this would definitely be a nice feature to have. What Ristretto does is it keeps a set of 32-bit float weights, as you pointed out. Those weights get quantized during the forward path, and get stored (as floats) in a separate blob.

Let me know in case you are interested in writing C++ code to enable the feature you described.
Best,
Philipp

desaipr...@gmail.com

unread,
Apr 24, 2017, 8:11:28 AM4/24/17
to ristretto-users
Hi Philipp,

So in short , Is it possible to deploy the quantized.prototxt with finetuned model with Ristretto layers on the low configuration Hardware (e.g Raspberrypi) for testing(scoring)?

You mentioned that , Ristretto Keeps a set of 32 bit float weights , and somewhere also you mentioned that ".caffemodel" uses 32bit FP to store the weights, (which does not reduce the size of the original model and Ristretto does not save .caffemodels) but those weights are quantized on the fly during Forward prop, so it does mean that , due to reduction to 8 bit word width via quantization reduces the over all MAC Operations size during testing, so low configuation Hardware can suppport this testing of  models via Ristretto layers, am I correct?

Best,
Pratik

pmg...@ucdavis.edu

unread,
Apr 24, 2017, 10:19:00 AM4/24/17
to ristretto-users
Hi Pratik,

Thanks for your follow-up question. In short, Ristretto simulates fixed point arithmetic, whereas you want to use fixed point arithmetic, e.g. you want to store weights in 8-bit fixed point and use 8-bit fixed point multipliers. As I understand, you want to run a fixed point network on a Raspberry PI. To achieve this, you will have to create a "compiler" which takes a Ristretto network (the original .caffemodel and the .prototxt generated by Ristretto) and generates the appropriate C-code which uses low precision arithmetic. If you're interested in creating such a compiler, let me know and I can give you some more information.

Let me try to explain more in detail how Ristretto works: Ristretto simulates fixed point arithmetic, but actually uses 32-bit floating point numbers. Without loss of generality, let's assume we want to simulate and 8-bit CNN layer. Now the convolutional and fully connected layers mainly consist of a matrix-matrix multiplication. To simulate these MAC operations, Ristretto takes both floating point matrices (layer inputs and weights), quantizes them to 8-bit fixed point, and de-quantizes them back to floating point. At this point the two matrices are equal to the original matrices, plus the quantization error. Now the forward path of this layer is done in floating point, by doing a regular 32-bit floating point matrix-matrix multiplication. So what we simulate here is 8-bit multipliers and large accumulators (say 32-bit accumulators) where no saturation occurs. The result of the matrix-matrix multiplication is a floating point matrix, which contains the de-quantized values we would get from using real 8-bit fixed point arithmetic. Now these floating point results are again quantized to 8-bit fixed point and de-quantized. Now we finished the simulation of a fixed point layer's forward propagation.

I hope this helps.
Philipp

desaipr...@gmail.com

unread,
Apr 24, 2017, 12:45:41 PM4/24/17
to ristretto-users
Hi Philip,

Thank you for quick reply.

I have installed Ristretto caffe on raspberrypi . I wanted to test the quantized  (dynamic fixed 8 bit points models) model for evaluation using ristretto (convolution and others) layers. But as you mentioned, It only simulates, so now I understood that we need separate compiler to use.

But In the response of question above ,user has mentioned that if It is possible to store a model fit for deployment using 8-bit dynamic fixed point. (I.e caffemodel that is roughly 1/4th of the original, in terms of size) but you mentioned that  Ristretto caffe does not support this at the moment.

I am interested in writing C++ code to enable the feature where Ristretto generates the reduced sized caffemodel (8bit dynamic fixed point) fit for deployment on raspberrypi and other hardwares.   Could please give me the pointers and more information how to enable it in Ristretto? Because I eventually want to use reduced sized (quantized)model on hardware for scoring without much loss of accuracy..

Thank you very much.

Best regards,
Pratik 

desaipr...@gmail.com

unread,
Apr 27, 2017, 5:53:32 AM4/27/17
to ristretto-users
Hi Philipp,

I am interested in writing C++ code to enable this Feature in Ristretto to store reduced sized caffemodel (using 8bit dynamic fixed point). Could please give me the pointers and more information how to enable it in Ristretto?

Thanks in advance !!

Best,
Pratik

pmg...@ucdavis.edu

unread,
Apr 30, 2017, 8:50:43 PM4/30/17
to ristretto-users
Sure I'm happy to give you some more information. Let me send you an Email with more details.

Best,
Philipp

jiaxin...@gmail.com

unread,
May 16, 2017, 9:12:22 AM5/16/17
to ristretto-users
Hi Philipp, 

I'm also interested in storing reduced sized caffemodel (8-bit dynamic fixed point) by Ristretto. Could you please give me some information about the implementation details?

Thanks a lot! I'm looking forward to your reply! Have a nice day!

Best regards,
Jasmine

pmg...@ucdavis.edu

unread,
May 22, 2017, 12:42:15 PM5/22/17
to ristretto-users
Hey Jasmine,

Thanks for the question. Yes you can create a compressed representation of the weights. Note though that Caffe uses float values to store weights in .caffemodel files, whereas you want to use a different representation.

Here's how you can dump fixed point weights to a file which uses a modified format to store weight values:
In Ristretto, each layer stores its weight fixed point format in a class variable: BaseRistrettoLayer::bw_params_ and BaseRistrettoLayer::fl_params_. During forward propagation, we quantize the floating point weights on-the-fly. You'll want to modify the code so the quantized weights get dumped to a file. Let's assume you run Ristretto in CPU mode. So each Ristretto layer will call BaseRistrettoLayer::Trim2FixedPoint_cpu(). This is where we quantize and de-quantize the weights. In base_ristretto_layer.cpp, lines 89 to 96, we shift down the float values and round them. So at this point you have a fixed point value, stored in a float variable (data[index]). Now you can convert all these values to int8_t (or whatever bit-widith you use) and dump them to a file.

Let me know if that helps. I hope I understood your question correctly.
Best,
Philipp

cindy...@gmail.com

unread,
Oct 26, 2017, 10:34:10 PM10/26/17
to ristretto-users
you mean Ristretto did not save the quantizated weighs in the .caffemodel?but i found the size of .solverstate is  twice the amount of .caffemodel。Could give me some clues about how can i save the quantizated  weights ?thanks

pmg...@ucdavis.edu

unread,
Oct 28, 2017, 5:12:46 AM10/28/17
to ristretto-users
Hi there,

That's correct, Ristretto does NOT save the quantized weights. For more information on how to get the quantized weights, please read my post from May 22.

Here's some additional info, part of it is already contained in the thread above: To be compatible with standard Caffe, Ristretto uses weights stored as 32-bit floats. If you just quantize a trained model, Ristretto does not modify the original weights. If you use fine-tuning, Ristretto generates new weights, but again, they will have 32-bit float format. Ristretto quantizes those float weights on-the-fly, during the forward path.

As for the solver file: This file won't help you. It's just used to store some information during training (like the momentum of the weights).

I hope that helps,
Philipp

chaohe...@gmail.com

unread,
Oct 9, 2018, 9:40:35 PM10/9/18
to ristretto-users
hello PRATIK,
Have you implemented this feature?Through quantized.prototxt to store reduced sized caffemodel(using 8bit dynamic fixed point,roughly 1/4th of the original,in terms of size)

chaohe...@gmail.com

unread,
Oct 9, 2018, 11:14:42 PM10/9/18
to ristretto-users
Hi,
From SqueezeNet Example: Replace 32-bit FP multiplications by 8-bit fixed point, at an absolute accuracy drop below 1%.I can get quantized.prototxt,but how can I get quantized.caffemodel(8bit dynamic fixed point,roughly 1/4th of the original,in terms of size),As shown by the red mark in the figure below

squeeze_8bit.png


Message has been deleted

barr...@gmail.com

unread,
Oct 20, 2018, 5:03:13 AM10/20/18
to ristretto-users
Hey,

Ristretto does not really quantize the original weights (i.e. it does not modify the .caffemodel). Before executing a convolution or fully-connected layer, it reduces the precision of the floating point weights (and input data as well). This makes that you compute the convolutional layer as if you would have reduced bit width weights and data.

If you really want to save the reduced-precision weights, you can make a script that converts the weights according to your quantization configuration, and stores the results to a new .caffemodel. I would start by looking at the Trim2FixedPoint_cpu in https://github.com/pmgysel/caffe/blob/master/src/caffe/ristretto/layers/base_ristretto_layer.cpp.

You can use this call to prepare your weights as follows:

template <typename Dtype>
void BaseRistrettoLayer<Dtype>::Trim2FixedPoint_cpu(Dtype* data, const int cnt,
      const int bit_width, const int rounding, int fl) {
  for (int index = 0; index < cnt; ++index) {
    // Saturate data
    Dtype max_data = (pow(2, bit_width - 1) - 1) * pow(2, -fl);
    Dtype min_data = -pow(2, bit_width - 1) * pow(2, -fl);
    data[index] = std::max(std::min(data[index], max_data), min_data);
    // Round data
    data[index] /= pow(2, -fl);
    switch (rounding) {
    case QuantizationParameter_Rounding_NEAREST:
      data[index] = round(data[index]);
      break;
    case QuantizationParameter_Rounding_STOCHASTIC:
      data[index] = floor(data[index] + RandUniform_cpu());
      break;
    default:
      break;
    }
    // data[index] *= pow(2, -fl); /* DO NOT SCALE BACK */
}
    
    // TO IMPLEMENT!
    for (int index = 0; index < cnt; ++index) {
         new_caffe_model[layer].weights[idx] = (int8_t)data[index];
    }
}

Now you should be able to use it like this during inference

template <typename Dtype>
void BaseRistrettoLayer<Dtype>::Trim2FixedPointWeights_cpu(Dtype* data, const int cnt,
      const int bit_width, const int rounding, int fl) {
  for (int index = 0; index < cnt; ++index) {
       data[index] = (float)new_caffe_model[layer].weights[idx] * pow(2, -fl)
  }
}

chaohe...@gmail.com

unread,
Oct 27, 2018, 10:11:11 AM10/27/18
to ristretto-users
Thanks,Philipp.
Can ristretto be used in caffe-ssd to quantify S3FD?
Teemo

DI Scofield

unread,
May 14, 2021, 3:02:18 AM5/14/21
to ristretto-users
hello,have you figured out how to quantize caffe-ssd using ristretto?
Reply all
Reply to author
Forward
0 new messages