How to determine the quantization parameters for multiplying two 8bit matrices?

1,333 views
Skip to first unread message

Ning Xu

unread,
Aug 2, 2016, 2:14:09 PM8/2/16
to gemmlowp
I read the doc/low-precision.txt, but am still not sure how those quantization parameters are chosen. Anyone could provide a link or a guideline showing how they are set?

Further, what I eventually wanted is a fast approximation for floating point matrix multiplication, for example, compute C = A * B, where they are all floating point matrices. I can quantize A to 8bit with regard to its own range, so that A_8bit = round[(A - min_A) * 255 / range_A], with min_A the minimum entry in A, and range_A the difference between maximum and minimum entries.  I can quantize B as well to B_8bit. Now I assume I can compute matrix C_8bit using gemmlowp, with a set of quantization parameters (not sure how to set this yet). So the question is, given C_8bit, how to efficiently compute an approximation of C, where C = A * B? Is there anything like this implemented here?

Pete Warden

unread,
Aug 2, 2016, 4:53:28 PM8/2/16
to Ning Xu, gemmlowp
Hi Ning,
             here's a brief description of how we use gemmlowp in TensorFlow to do arbitrary multiplications:

 - We assume that A and B are linearly quantized between Amin and Amax and Bmin and Bmax real numbers, where code 0 represents the float value A/Bmin, and 255 represents the float value A/Bmax.

 - To get the offset values, we figure out what the value 0.0 would be as a quantized code for each input range, so it can be subtracted from the actual codes before multiplication. There's an example of that here:

 - We then run gemmlowp. The raw results from the sum of 8x8 multiply-adds is a 32-bit number. There are options in gemmlowp to output this directly, or you can specify offset, shift, and multiply conversion factors to truncate them into eight bit results.

 - The 32-bit results will represent values in a known float range. This is calculated in the QuantizationRangeForMultiplication() function, which works by figuring out what the difference between one code level and the next is in each input range, and multiplying them together to work out what the delta between code levels in the output is.

To be honest, I still find it all a bit mind-bending myself and usually end up sitting down with pen and paper to work through the math when I have to come back to it. It's a very general way of representing float values with a limited set of codes, but I find fixed-point is a lot easier to reason about.

Does that help?

Pete

On Tue, Aug 2, 2016 at 11:14 AM, Ning Xu <ning...@gmail.com> wrote:
I read the doc/low-precision.txt, but am still not sure how those quantization parameters are chosen. Anyone could provide a link or a guideline showing how they are set?

Further, what I eventually wanted is a fast approximation for floating point matrix multiplication, for example, compute C = A * B, where they are all floating point matrices. I can quantize A to 8bit with regard to its own range, so that A_8bit = round[(A - min_A) * 255 / range_A], with min_A the minimum entry in A, and range_A the difference between maximum and minimum entries.  I can quantize B as well to B_8bit. Now I assume I can compute matrix C_8bit using gemmlowp, with a set of quantization parameters (not sure how to set this yet). So the question is, given C_8bit, how to efficiently compute an approximation of C, where C = A * B? Is there anything like this implemented here?

--
You received this message because you are subscribed to the Google Groups "gemmlowp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gemmlowp+u...@googlegroups.com.
To post to this group, send email to gemm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gemmlowp/9c3e929c-4468-4e78-b92f-0cf462df1368%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ning Xu

unread,
Aug 2, 2016, 9:48:13 PM8/2/16
to gemmlowp, ning...@gmail.com
Thanks a lot, Pete. It makes a lot of sense.

So here is what I do for floating point C = A * B:

. A_8bit = round( (A - min_A) * 255 / range_A)
. Offset_A = round( min_A * 255 / range_A )
. B_8bit = round( (B - min_B) * 255 / range_B )
. Offset_B = round( min_B * 255 / range_B)
. C_32bit = (A_8bit + Offset_A) * (B_8bit + Offset_B) = A * B * 255 * 255 / (range_A * range_B), ignoring rounding errors and assuming no overflow
==> C = A * B = C_32bit * (range_A * range_B) / (255 * 255)

I am still not sure how a reasonable output quantization parameters are chosen before we obtain the minimum value and the range of C matrix. That seems doesn't matter for my purpose though:-)

To unsubscribe from this group and stop receiving emails from it, send an email to gemmlowp+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages