cannot reproduce results of Dynamic Fixed Point

alexey.ch...@gmail.com

unread,

Aug 19, 2016, 9:28:40 AM8/19/16

to ristretto-users

I'm trying to check my understanding of how Dynamic FP works (including operations with it), but so far I'm not 100% sure if I got it right. I use Pycaffe for experiments. I have a CNN that was finetuned by Ristretto, and want to compare the output of the 1st layer of this CNN to the output of a similar CNN with usual Convolution layer. The results do not coincide. Here's the code:

# prototxt obtained by Ristretto
name: "myconvolution"
input: "data"
input_dim: 1
input_dim: 3
input_dim: 227
input_dim: 227

layer {
name: "conv1"
type: "ConvolutionRistretto"
bottom: "data"
top: "conv1"
convolution_param {
    num_output: 96
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "xavier"
    }
}
quantization_param {
    bw_layer_in: 8
    bw_layer_out: 8
    bw_params: 8
    fl_layer_in: 0
    fl_layer_out: -3
    fl_params: 7
}
}

# my non-Ristretto equivalent
name: "myconvolution"
input: "data"
input_dim: 1
input_dim: 3
input_dim: 227
input_dim: 227
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
    num_output: 96
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "xavier"
    }
}
}

Considering the above parameters for bw_* and fl_*, I created this Python code to compare. First, I create random data, then load the Ristretto prototxt+caffemodel and run forward propagation to get the output. Then, I load a non-Ristretto equivalent prototxt, and populate its weights and biases by those obtained by Ristretto, by changing them according to formulas that I found in Ristretto BaseRistrettoLayer code in function Trim2FixedPoint_cpu:

# load pre-trained model
net = caffe.Net(proto_path, caffemodel, caffe.TEST)
w = net.params['conv1'][0].data
b = net.params['conv1'][1].data

# get output of model from Ristretto
data = (255*np.random.random((3,227,227))).astype(np.int32)-128
net.blobs['data'].data[...] = data
output = net.forward()['conv1']

# load similar architecture, but with regular (non-Ristretto) convolutional layers
net2 = caffe.Net('/home/debugCNN.prototxt', caffe.TEST)

# saturate weights and biases to [-(2^7)/2^7, (2^7 - 1)/2^7], which comes from - 2^(bw_params-1)/2^fl_params and (2^(bw_params-1)-1)/2^fl_params
w[w<-1] = -1
b[b<-1] = -1
w[w>0.9921875] = 0.9921875
b[b>0.9921875] = 0.9921875

# shift according to fl_params=7, and round
w = np.round(w/(2**-7))
b = np.round(b/(2**-7))

# shift back - this is delayed after output is obtained
#w = w*(2**-7)
#b = b*(2**-7)
# fill non-Ristretto regular convolution layer weight and biases with the modified values:
net2.params['conv1'][0].data[...] = w
net2.params['conv1'][1].data[...] = b
net2.blobs['data'].data[...] = data
output2 = net2.forward()['conv1']

# shift result because I did not shift the weights and biases:
output2 = np.round(output2*(2**-7))

# now transform result according to bw_layer_out=8 and fl_layer_out=-3
# saturate to [-(2^7)/(2^-3), (2^7 - 1)/(2^-3)]
output2[output2<-1024] = -1024
output2[output2>1016] = 1016

# shift according to fl_layer_out=-3 and round
output2 = np.round(output2/(2**3))
output2 = output2*(2**-3)

When I execute all of the above, my output and output2 range around -630 and +630 (depending on random initialization) and seem to correlate, but the difference between them is not zero - it is uniformly distributed between -8 and +8. So I guess I am doing something wrong. Could anyone please help me? I need this because I want to implement a pre-trained Ristretto layer in a special platform in fixed point.

pmg...@ucdavis.edu

unread,

Aug 19, 2016, 3:02:11 PM8/19/16

to ristretto-users

Hi Alexey,

Thanks for your question. Let me rephrase what you are trying to do. You do the forward propagation of one Ristretto conv layer. Then you want to reproduce the layer outputs with a normal convolutional layer. This should indeed work, since the simulation of dynamic fixed point is done by quantizing the weights and inputs, then do the normal forward path, and then quantizing the output.

Now, as you saw, Trim2FixedPoint_cpu shows how Ristretto does the quantization:

-Saturate number

-Shift number to the left (for FL>0), according to the fractional length

-Round (to get rid of fractional digits)

-Shift back

All this happens in floating point format.

I see some minor differences to your code. Why do you delay shifting back the weights? If you actually do that right away, then you don't need to shift the output prior to saturation. Finally, you should also quantize the layer input.

Let me know if you can reproduce the same results,

Best,

Philipp

alexey.ch...@gmail.com

unread,

Aug 22, 2016, 9:14:57 AM8/22/16

to ristretto-users

Hi Philipp,
The reason why I delay the the shifting back of the weights is that on my target architecture (that I want to emulate first, using the Python code) I only have integer units. If I read the Ristretto-produced weights (they are floats), multiply them by 2^7, then quantize, and then shift back the resulting value (considering it as an integer) 7 bits to the right, I will get zeros, and the convolution result will be all zeros. That is why I first do the convolution with unshifted weights, and only after that I shift the result by 7 bits.

When I look at the entries of output-output2, I get these results:

When using
output2 = np.round(output2*(2**-7))
94% of the entries are 0; 3% are +8, 3% are -8

When using
output2 = output2.astype(np.int32) >> 7
50% of the entries are 0; 50% are +8

When I shift back the weights before computing the output (as you suggest), I get 100% correct result.
So, schematically:

1) shift weights => weights as ints => convolution => scale output back => saturate&quantize output => incorrect result
2) shift weights => weights as floats => convolution => saturate&quantize output => correct results

This is great but, again, Ristretto was made to create fixed-point-ready CNNs, but I cannot reproduce Ristretto results in pure fixed-point scenario. I might be having some incorrect assumptions in my Python code (something about np.round and conversion to-from float32 and int32).

Philipp Gysel

unread,

Oct 1, 2016, 7:07:06 PM10/1/16

to ristretto-users

Hi Alexey,

The weights stored in *.caffemodel is always in high precision (32-bit floating point). The weights get quantized on the fly during forward propagation, but the weights themselves stay in high precision. Sorry I didn't understand your question in the first place. Here is a similar post.

Best,

Philipp

alexey.ch...@gmail.com

unread,

Oct 6, 2016, 11:49:08 AM10/6/16

to ristretto-users

Hello Philipp,

I realized my mistake:

When I rescaled back my integer output by dividing it by 2^7, I did this (see my previous post):

output2 = output2.astype(np.int32) >> 7

However, this operation truncates the result. This is not equivalent to rounding! So, in order to obtain results equivalent to those that you get in dynamic fixed point in Ristretto, but using regular Caffe convolution layers AND integer numbers everywhere, I had to first multiply the weights and biases by 2^7, and then divide the integer result of the convolution by 2^7, but perform ROUNDING, and not truncation like in the code snippet above.

Upon searching the web, I found a nice snippet for rounding. And I do

output2 = (output2.astype(np.int32) + (2**7)/2) >> 7
# here I assume that output2>=0; the code is slightly different when output2 is negative

When I plugged in this line, I got complete agreement between the convolution results in Ristretto, and regular Caffe convolution layers, assuming integer weight, biases and outputs. And I was able to take a Ristretto-produces version of SqueezeNet and succesfully port it to an architecture (simulator) that works with integer numbers only. And I got the same results. So thanks for your tool and keep up the good work!

yixin...@gmail.com

unread,

Jun 5, 2018, 7:09:02 PM6/5/18

to ristretto-users

Hi Philipp,

I'm trying to train a simple residual learning based loop filter (LF) using Caffe_DFP, which is a modified version of Ristretto Caffe. However, after training, I found that all the weights of conv layers in caffemodel are zero. While the training loss did decrease from 1000 to 0.3. This zero weights problem also happens when I tried to train another simple network such as VDSR using Caffe_DFP. As a comparison, when I train LF (without quantization) or VDSR using official caffe, this issue does not exist.

The repository for Caffe_DFP can be found at: https://github.com/Hikvision-Codec/Caffe_DFP

I have attached my train.prototex and solver, as well as the loss plot in the group list: https://groups.google.com/forum/#!topic/caffe-users/Hu90fYJnGHY

Could you please help take a look?

Thank you very much!

Yixin

Reply all

Reply to author

Forward