Using gemmlowp for "pure" uint8_t and int8_t matrix multiplications

309 views
Skip to first unread message

Yaman Umuroglu

unread,
May 15, 2017, 10:48:08 AM5/15/17
to gemmlowp
Hi there, 

From the documentation, I understand that the public gemmlowp interfaces are focused on quantizing existing full-precision neural networks, where explicit (de)quantization steps are needed to enter and exit the low-precision GEMM domain.

However, there is some recent work on training NNs that directly use quantized weights and activations [1,2,3], where one can directly use 8-bit (signed/unsigned) math. I was wondering if it's possible to use gemmlowp without having to go through (de)quantization process for both signed and unsigned 8-bit numbers, reading out the 32-bit integer accumulator variables directly?

For unsigned 8-numbers I guess this corresponds to setting the *_offset = 0 and instantiating an empty output pipeline, but the public.md states that only uint8_t is supported as the rhs/lhs type at the moment and setting offset=128 seems a bit wasteful when the ISA supports signed operations directly.


Thanks in advance!

- Yaman

Benoit Jacob

unread,
May 15, 2017, 11:02:42 AM5/15/17
to Yaman Umuroglu, gemmlowp
On Mon, May 15, 2017 at 10:48 AM, Yaman Umuroglu <malt...@gmail.com> wrote:
Hi there, 

From the documentation, I understand that the public gemmlowp interfaces are focused on quantizing existing full-precision neural networks, where explicit (de)quantization steps are needed to enter and exit the low-precision GEMM domain.

However, there is some recent work on training NNs that directly use quantized weights and activations [1,2,3], where one can directly use 8-bit (signed/unsigned) math. I was wondering if it's possible to use gemmlowp without having to go through (de)quantization process for both signed and unsigned 8-bit numbers, reading out the 32-bit integer accumulator variables directly?

For unsigned 8-numbers I guess this corresponds to setting the *_offset = 0 and instantiating an empty output pipeline,

Yes, that's correct (with uint8, as you note below). That is exactly what this part of the test covers:
 
but the public.md states that only uint8_t is supported as the rhs/lhs type at the moment and setting offset=128 seems a bit wasteful when the ISA supports signed operations directly.

(Right --- note, it's rather offset=-128, not +128). 

Indeed, there is nonzero overhead here, and as you note, the ISA does support signed directly.

For NxN matrices, the overhead of handling the offset is O(N^2) is the GEMM as a whole is O(N^3), so the overhead is negligible for all but the smallest matrix sizes. Note, the trick which allows to have only O(N^2) offsets handling overhead is

At the moment, gemmlowp does not offer a way to avoid the operands offset handling overhead when the offsets are 0 --- contrary to the output pipeline, which may be empty and then has no overhead.

On the other hand, if you want to hack around gemmlowp in this direction, you may find this interesting: recently, I found a way to write a much faster kernel if the operands at the kernel level are int8 instead of uint8, see
which might serve as inspiration for various things you may want to experiment with in this area.

Cheers
Benoit
 

--
You received this message because you are subscribed to the Google Groups "gemmlowp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gemmlowp+unsubscribe@googlegroups.com.
To post to this group, send email to gemm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gemmlowp/625c5a1a-429d-4701-8d05-37801a522f93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages