Re: Tensorflow : Could you teach me how to quantize the weight , output of activation and gradient?

2,515 views
Skip to first unread message
Message has been deleted

Kelly Davis

unread,
Aug 27, 2016, 4:17:39 AM8/27/16
to TomohiroUeno-ABEJA, Discuss
Eventually we are planning on using this feature of TensorFlow. However, we haven't gotten to that stage of our project yet.

That said, the blog post How to Quantize Neural Networks with TensorFlow is by one of the people working on the implementation and describes how quantization is done.

Let me know how it turns out as we'll eventually go down that road too.

On Sat, Aug 27, 2016 at 9:27 AM, TomohiroUeno-ABEJA <b140...@planet.kanazawa-it.ac.jp> wrote:
Hello. I am an internship participant of ABEJA.
Thank you for reading about my issue.

Now I am trying to make DoReFa Net, a binaryzed newral network.

But I have trouble.
 
I don't know how to quantize the weight ,output of activation and gradient.

https://github.com/ppwwyyxx/tensorpack/tree/master/examples/DoReFa-Net
http://arxiv.org/abs/1606.06160

I want to use Tensorflow and I searched about it.
But there are very few document that explains about the method to quantize the thing.

Activation's output is 2 bit floating point number and weight is quantized as 1 bit number.But gradient is quantized as 6 bit floating point number.

In my thought,the gradient means the amout of number to update by backward propagation.

I am using python 2.7 and tensorflow is version 0.9.

Any kind of advise is OK,could you  please help me?

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/f047ab91-4a01-4ed9-9237-6984a66c53b7%40tensorflow.org.



--
Kelly Davis
Bringing a voice to Connected Devices
Message has been deleted
Message has been deleted

Yuxin Wu

unread,
Aug 28, 2016, 9:28:55 AM8/28/16
to Discuss, b140...@planet.kanazawa-it.ac.jp
The quantize() function here https://github.com/ppwwyyxx/tensorpack/blob/master/examples/DoReFa-Net/dorefa.py#L16
quantize a real number in (0,1) to a k-bit fixed point number. It is also explained in the original paper.

The tutorial How to Quantize Neural Networks with TensorFlow quantize the network after it is trained, so it's different from DoReFa-Net.
You can easily post-process a network to 8 bit, but going to 1 or 2 bit after it's trained will certainly break the model.

Michael Klachko

unread,
Dec 30, 2017, 10:08:08 PM12/30/17
to Discuss
Hi Yuxin, I'm reading DoReFa-Net paper, and I'm curious about step 18 in the Algorithm description. I don't see this step explained in the paper. Where can I find it in the code?

Thanks,
Michael

Yuxin Wu

unread,
Dec 31, 2017, 1:25:27 AM12/31/17
to Michael Klachko, Discuss
It's chain rule and tensorflow has done it automatically.

--
You received this message because you are subscribed to a topic in the Google Groups "Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/a/tensorflow.org/d/topic/discuss/OgGd7sfu1bE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.
Message has been deleted

Michael Klachko

unread,
Dec 31, 2017, 3:35:56 PM12/31/17
to Yuxin Wu, Discuss
Yuxin,

I'm not sure I understand the note for equation 10. This note points to the definition of quantize_k operation, which states dC/dr_i = dC/dr_o (Eq. 6), therefore it follows that dr_o/dr_i = 1. Am I correct so far?

If I am correct, then dW_b/dW in step 18 of the Algorithm is also 1, and therefore gW_b = gW (we accumulate quantized gradients). Is step 18 assumed to be skipped in the paper, or am I missing something?

This seems to be the case in the code: line 77 of svhn-digit-dorefa.py shows how fg function is applied to layers (defined in line 39 in dorefa.py). fg function produces quantized gradients. It's not clear to me how or where these quantized gradients are transformed before performing weight updates.

Thanks,
Michael


Yuxin Wu

unread,
Dec 31, 2017, 3:54:11 PM12/31/17
to Michael Klachko, Discuss
dr_o/dr_i in Eq.5 and Eq.6 is 1.

In Eq. 10, r_o and r_i are something else.

Michael Klachko

unread,
Dec 31, 2017, 4:06:20 PM12/31/17
to Discuss
Ok, so dr_o/dr_i in Eq. 10 involves differentiation of Eq. 9, correct? Eq. 9 contains quantize_k operator. I assumed we should differentiate it using STE defined in Eq 5 and Eq 6. Is that correct? If yes, then dr_o/dr_i = 1. If not, which STE should we use there? Please help me understand.


Thanks,
Michael
To unsubscribe from this group and all its topics, send an email to discuss+u...@tensorflow.org.
Reply all
Reply to author
Forward
0 new messages