Integer / Fixed-Point / Quantized Neural Networks

David Gschwend

unread,

Mar 2, 2016, 8:01:41 AM3/2/16

to torch7

Hi all!

I'm currently looking into training networks using fixed-point / integer weights and activations instead of floats (with an FPGA-based accelerator for CNNs in mind).

I see that torch itself has basic support for integer Tensors (ByteTensor, LongTensor, CudaByteTensor, CudaLongTensor, ...).

But it seems like nn (and cunn) only support Float/Double Tensors:

Spatial Convolution with FloatTensor works just fine:
nn.SpatialConvolution(1,3,3,3):type('torch.FloatTensor'):forward(torch.FloatTensor(1,1,5,5))
... (correct output)

Spatial Convolution with ByteTensor fails:
nn.SpatialConvolution(1,3,3,3):type('torch.ByteTensor'):forward(torch.ByteTensor(1,1,5,5))
... (error: torch/install/share/lua/5.1/nn/SpatialConvolution.lua:100: attempt to index field 'THNN' (a nil value).)

Looks like the THNN backend functions are missing for ByteTensor & co.

Has anyone ever worked with fixed-point numbers in torch/nn?

Are there any plans on supporting this?

Best regards,

David

Jonghoon Jin

unread,

Mar 2, 2016, 11:22:53 PM3/2/16

to torch7 on behalf of David Gschwend

You can try few byte-precision modules using this package

https://github.com/jhjin/nn8

it only supports

- convolution

- maxpool

- threshold

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

David Gschwend

unread,

Mar 3, 2016, 12:03:14 PM3/3/16

to torch7

Thanks a lot! Have you gone any further than the precision and speed tests with random data?

Did you try inference with real weights and images, or have you tried training (e.g. fixed-point fprop, float bprop)?

Am Donnerstag, 3. März 2016 05:22:53 UTC+1 schrieb Jonghoon Jin:

You can try few byte-precision modules using this package
https://github.com/jhjin/nn8

it only supports
- convolution
- maxpool
- threshold

Jonghoon Jin

unread,

Mar 4, 2016, 11:44:28 AM3/4/16

to torch7 on behalf of David Gschwend

no, I did not try any training with the modules.

On Thu, Mar 3, 2016 at 12:03 PM, David Gschwend via torch7 <torch7+APn2wQerL7yEBUK69-IKgfvxA...@googlegroups.com> wrote:

Thanks a lot! Have you gone any further than the precision and speed tests with random data?
Did you try inference with real weights and images, or have you tried training (e.g. fixed-point fprop, float bprop)?

Am Donnerstag, 3. März 2016 05:22:53 UTC+1 schrieb Jonghoon Jin:

You can try few byte-precision modules using this package
https://github.com/jhjin/nn8

it only supports
- convolution
- maxpool
- threshold

李旻

unread,

Jul 17, 2017, 1:11:56 PM7/17/17

to torch7

Hi David,

I have read your master thesis <<ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network>>, and I am also excited about the implementation of DNN on FPGA. I wonder if you have figured out how to train a fixed-point network on Torch. I'm doing a project about implementing an RNN on FPGA with fixed-point and want to train a quantized RNN on Torch at the first stage.

Best regards,

Min Li

在 2016年3月2日星期三 UTC-8上午5:01:41，David Gschwend写道：

David Gschwend

unread,

Jul 18, 2017, 3:18:59 AM7/18/17

to torch7

Hi Min Li

I never figured that out. I used Caffe for training, and Ristretto for Quantization (floating-point training -> quantization -> fixed-point inference).

I never trained a network in fixed-point format so far.

Best regards

David

--
You received this message because you are subscribed to a topic in the Google Groups "torch7" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/torch7/NDhE1ER17sA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to torch7+un...@googlegroups.com.

Arjun Jain

unread,

Jul 18, 2017, 6:14:08 AM7/18/17

to torch7 on behalf of David Gschwend

Keep 2 versions of weights and biases, one floating and another copy of this floating which truncated to fixed. For fpop, truncate floating weights to fixed, truncate output to fixed. For bprop, update normally (so no change).

For float to fixed you can do something like: https://github.com/MatthieuCourbariaux/deep-learning-multipliers/blob/master/format.py#L58

But actually you can do better and use 2's compliment (https://github.com/MatthieuCourbariaux/deep-learning-multipliers/issues/3)

On Tue, Jul 18, 2017 at 12:48 PM, torch7 on behalf of David Gschwend <tor...@googlegroups.com> wrote:

Hi Min Li

I never figured that out. I used Caffe for training, and Ristretto for Quantization (floating-point training -> quantization -> fixed-point inference).
I never trained a network in fixed-point format so far.

Best regards
David

Am 17.07.2017 um 19:11 schrieb 李旻 via torch7 <torch7+APn2wQdFnfJSd3qxDSqp5bVIqac4i0oAa--hYIGUbyEllxNcRMgq4ozqF@googlegroups.com>:

Hi David,

I have read your master thesis <<ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network>>, and I am also excited about the implementation of DNN on FPGA. I wonder if you have figured out how to train a fixed-point network on Torch. I'm doing a project about implementing an RNN on FPGA with fixed-point and want to train a quantized RNN on Torch at the first stage.

Best regards,
Min Li

在 2016年3月2日星期三 UTC-8上午5:01:41，David Gschwend写道：
Hi all!

I'm currently looking into training networks using fixed-point / integer weights and activations instead of floats (with an FPGA-based accelerator for CNNs in mind).

I see that torch itself has basic support for integer Tensors (ByteTensor, LongTensor, CudaByteTensor, CudaLongTensor, ...).
But it seems like nn (and cunn) only support Float/Double Tensors:

Spatial Convolution with FloatTensor works just fine:
nn.SpatialConvolution(1,3,3,3):type('torch.FloatTensor'):forward(torch.FloatTensor(1,1,5,5))
... (correct output)

Spatial Convolution with ByteTensor fails:
nn.SpatialConvolution(1,3,3,3):type('torch.ByteTensor'):forward(torch.ByteTensor(1,1,5,5))
... (error: torch/install/share/lua/5.1/nn/SpatialConvolution.lua:100: attempt to index field 'THNN' (a nil value).)
Looks like the THNN backend functions are missing for ByteTensor & co.

Has anyone ever worked with fixed-point numbers in torch/nn?
Are there any plans on supporting this?

Best regards,
David

--
You received this message because you are subscribed to a topic in the Google Groups "torch7" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/torch7/NDhE1ER17sA/unsubscribe.

To unsubscribe from this group and all its topics, send an email to torch7+unsubscribe@googlegroups.com.

To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "torch7" group.

To unsubscribe from this group and stop receiving emails from it, send an email to torch7+unsubscribe@googlegroups.com.

李旻

unread,

Jul 18, 2017, 1:23:56 PM7/18/17

to torch7

Hi David,

Thanks for your reply. Your project on Github really inspires me. Thanks.

It's a little tricky to train an RNN using Caffe. I will figure out how to train it in the fixed-point format on Torch.

I found a project which can quantize the weights when fprop an bprop on Torch yesterday. https://github.com/apaszke/torch-quantize. But the format is still float-point indeed, so the memory cannot cut down. May I will use it to do some experiments.

Best regards,

Min Li

在 2017年7月18日星期二 UTC-7上午12:18:59，David Gschwend写道：

David Gschwend

unread,

Jul 19, 2017, 2:02:52 AM7/19/17

to torch7

Hi Min Li

You are very welcome, I'm glad the project is of use to others! :)

In Ristretto, they also keep weights and activations in floating point, but quantize them to the desired fixed-point format between the layers. Thus the actual computation (Conv, FC, ...) is not really quantized, but still in float.

For your hardware, this means that multiplication and accumulation need to use wide-enough data formats during the computation (i.e. Mult(16b, 16b) -> 32b, Add(32b, 32b) -> 33b, ...). Then you don't lose information and approximate the floating-point computation perfectly. The small fixed-point formats are then only used between the layers, i.e. towards the memory for loading/storing activations + loading weights.

I guess you could even start by just quantizing the weights and activations based on the float network. If you use a wide-enough fixed-point format (eg. 16 bits and a layer-dependent fraction width), you should see only a negligible accuracy loss in your network... if you try even smaller formats, you probably need to re-train in fixed-point format though.

Reply all

Reply to author

Forward