Re: Tensorflow : Could you teach me how to quantize the weight , output of activation and gradient?

Message has been deleted

Kelly Davis

unread,

Aug 27, 2016, 4:17:39 AM8/27/16

to TomohiroUeno-ABEJA, Discuss

Eventually we are planning on using this feature of TensorFlow. However, we haven't gotten to that stage of our project yet.

That said, the blog post How to Quantize Neural Networks with TensorFlow is by one of the people working on the implementation and describes how quantization is done.

Let me know how it turns out as we'll eventually go down that road too.

On Sat, Aug 27, 2016 at 9:27 AM, TomohiroUeno-ABEJA <b140...@planet.kanazawa-it.ac.jp> wrote:

Hello. I am an internship participant of ABEJA.
Thank you for reading about my issue.

Now I am trying to make DoReFa Net, a binaryzed newral network.

But I have trouble.

I don't know how to quantize the weight ,output of activation and gradient.

https://github.com/ppwwyyxx/tensorpack/tree/master/examples/DoReFa-Net
http://arxiv.org/abs/1606.06160

I want to use Tensorflow and I searched about it.
But there are very few document that explains about the method to quantize the thing.

Activation's output is 2 bit floating point number and weight is quantized as 1 bit number.But gradient is quantized as 6 bit floating point number.

In my thought,the gradient means the amout of number to update by backward propagation.

I am using python 2.7 and tensorflow is version 0.9.

Any kind of advise is OK,could you 　please help me?

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/f047ab91-4a01-4ed9-9237-6984a66c53b7%40tensorflow.org.

--

Kelly Davis

Bringing a voice to Connected Devices

Message has been deleted

Yuxin Wu

unread,

Aug 28, 2016, 9:28:55 AM8/28/16

to Discuss, b140...@planet.kanazawa-it.ac.jp

The quantize() function here https://github.com/ppwwyyxx/tensorpack/blob/master/examples/DoReFa-Net/dorefa.py#L16

quantize a real number in (0,1) to a k-bit fixed point number. It is also explained in the original paper.

The tutorial How to Quantize Neural Networks with TensorFlow quantize the network after it is trained, so it's different from DoReFa-Net.

You can easily post-process a network to 8 bit, but going to 1 or 2 bit after it's trained will certainly break the model.

Michael Klachko

unread,

Dec 30, 2017, 10:08:08 PM12/30/17

to Discuss

Hi Yuxin, I'm reading DoReFa-Net paper, and I'm curious about step 18 in the Algorithm description. I don't see this step explained in the paper. Where can I find it in the code?

Thanks,
Michael

Yuxin Wu

unread,

Dec 31, 2017, 1:25:27 AM12/31/17

to Michael Klachko, Discuss

It's chain rule and tensorflow has done it automatically.

--
You received this message because you are subscribed to a topic in the Google Groups "Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/a/tensorflow.org/d/topic/discuss/OgGd7sfu1bE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/e42ea10e-77ab-4cbb-90d1-205e552027c4%40tensorflow.org.

Message has been deleted

Michael Klachko

unread,

Dec 31, 2017, 3:35:56 PM12/31/17

to Yuxin Wu, Discuss

Yuxin,

I'm not sure I understand the note for equation 10. This note points to the definition of quantize_k operation, which states dC/dr_i = dC/dr_o (Eq. 6), therefore it follows that dr_o/dr_i = 1. Am I correct so far?

If I am correct, then dW_b/dW in step 18 of the Algorithm is also 1, and therefore gW_b = gW (we accumulate quantized gradients). Is step 18 assumed to be skipped in the paper, or am I missing something?

This seems to be the case in the code: line 77 of svhn-digit-dorefa.py shows how fg function is applied to layers (defined in line 39 in dorefa.py). fg function produces quantized gradients. It's not clear to me how or where these quantized gradients are transformed before performing weight updates.

Thanks,

Michael

Yuxin Wu

unread,

Dec 31, 2017, 3:54:11 PM12/31/17

to Michael Klachko, Discuss

dr_o/dr_i in Eq.5 and Eq.6 is 1.

In Eq. 10, r_o and r_i are something else.

Michael Klachko

unread,

Dec 31, 2017, 4:06:20 PM12/31/17

to Discuss

Ok, so dr_o/dr_i in Eq. 10 involves differentiation of Eq. 9, correct? Eq. 9 contains quantize_k operator. I assumed we should differentiate it using STE defined in Eq 5 and Eq 6. Is that correct? If yes, then dr_o/dr_i = 1. If not, which STE should we use there? Please help me understand.

Thanks,
Michael

To unsubscribe from this group and all its topics, send an email to discuss+u...@tensorflow.org.

Reply all

Reply to author

Forward