Hi there,
From the documentation, I understand that the public gemmlowp interfaces are focused on quantizing existing full-precision neural networks, where explicit (de)quantization steps are needed to enter and exit the low-precision GEMM domain.
However, there is some recent work on training NNs that directly use quantized weights and activations [1,2,3], where one can directly use 8-bit (signed/unsigned) math. I was wondering if it's possible to use gemmlowp without having to go through (de)quantization process for both signed and unsigned 8-bit numbers, reading out the 32-bit integer accumulator variables directly?
For unsigned 8-numbers I guess this corresponds to setting the *_offset = 0 and instantiating an empty output pipeline, but the public.md states that only uint8_t is supported as the rhs/lhs type at the moment and setting offset=128 seems a bit wasteful when the ISA supports signed operations directly.
Thanks in advance!
- Yaman