gemmlow quantization: assumptions on input values.

101 views
Skip to first unread message

Sudi

unread,
Dec 13, 2018, 11:50:34 AM12/13/18
to gemmlowp
Hello All,
Referring to gemllowp's quantization step, there's an assumption that we typically map a narrow range of matrix values to a quantized range that is much larger; e.g. [0 .. 255] with uint8 **
(see: here)

Q: For deep NNs, is this assumption always true?
     i.e. are the the matrix elements always within a narrow range that is smaller than the quantized range, say [0..255]?

e.g. let's look at weights and activations:

Weights:
 - several papers/sources on the web indicate that trained-weights are typically real-numbers in the range (-1,1)
   (empirical results)
 - however, this is empirical data from existing networks; Q: will this always hold true?

Activation:
 Intermediate Layer:
  - intermediate layers (of a network) are constrained to a smaller range
    (RELU / RELU6 / sigmoid etc applied on the previous layer's output)
 - so, it seems like we're good here?

 Input Layer:
   - for the input layer, the activation may be large values (e.g. raw audio/video data)
   - Q: The assumption that activation values are in a narrow range may not hold for this layer, right?
          e.g. raw pixel values can take values from 0-255
==
Any comments/thoughts ..?

Thanks in advance.


** I am specifically referring to  equation 7 from the gemllowp quantization-documentation:
result_quantized_value = result_zero_point +
    (lhs_scale * rhs_scale / result_scale) * int32_accumulator       (7)

The difficulty here is of course that (lhs_scale * rhs_scale / result_scale) is a positive real number, not an integer in general.

It is a constant, though. So what we have to implement here is the (approximate) scaling of a int32 value by some arbitrary positive constant multiplier.


Moreover, it is safe to assume that this positive constant multiplier is smaller than one — each of the scale values here is typically smaller than one, as we are typically mapping the [0..255] quantized uint8 value range to an interval of real values that is much narrower than that, typically within [-10,10] in most neural networks. For example, a neural network using Relu6 activation functions will typically have real activation values in the interval [0,6].

== 

Benoit Jacob

unread,
Dec 13, 2018, 12:53:25 PM12/13/18
to sud...@gmail.com, gemmlowp
On Thu, Dec 13, 2018 at 11:50 AM Sudi <sud...@gmail.com> wrote:
Hello All,
Referring to gemllowp's quantization step, there's an assumption that we typically map a narrow range of matrix values to a quantized range that is much larger; e.g. [0 .. 255] with uint8 **
(see: here)

Q: For deep NNs, is this assumption always true?
     i.e. are the the matrix elements always within a narrow range that is smaller than the quantized range, say [0..255]?

Great questions! Three angles of replies:
(1) Don't mix up scales and multipliers; it is only multipliers that we have in the past required to be <1.
(2) Yes in the past we have required multipliers to be <1. Not anymore in current gemmlowp.
(3) There are both good and bad reasons why some multipliers may be >1.

Expanding into more detail:

(1) In the gemmlowp document that you are quoting below, the value that we have been assuming was <1 was not any one of the quantization scales, but the following product, which we called the "multiplier":

multiplier = lhs_scale * rhs_scale / result_scale

In practice, in NN application, LHS is weights and RHS is input activations and result is output activations, so the above reads:

multiplier = weights_scale * input_activations_scale / output_activations_scale

  It is often, but not always, the case that input and output activations have roughly the same quantization scale, and in that case, the above simplifies to the following approximation:

multiplier ~= weights_scale

So in this common case, it is true that the "multiplier<1" requirement implies approximately "weights_scale<1".


(2) This quantization doc is a bit outdated. Since it was written, a couple of things have happened.  First, we have published this paper, whose Section 2 is largely a refactoring of this doc. However, even that paper still mentions the multiplier<1 assumption (section 2.2, "We empirically find it to always be in the interval (0, 1)").

Later, we actually generalized gemmlowp to allow multipliers to be greater than 1, by allowing for the shift amount to be a left-shift instead of a right shift. See OutputStageScaleInt32ByFixedPointAndExponent:

See how it's used in TFLite:

(3) Why would a multiplier be > 1 in practice and what does that entail?

A multiplier greater than one means that not all of the possible output quantized values will be used.

Example: if multiplier==2 and output_zero_point==0 then only EVEN values will be produced for output activations, since these values will be the result of multiplying the accumulator value by 2.  So for an output type of uint8, this means that instead of using all 8 bits of representation power, only 7 bits will be used (since the lowest bit will be 0).

So, multipliers>1 imply non-optimal usage of the output quantized type's representation power. That's not ideal.

In practice this could happen for a number of reasons: 
   - (good reason) perhaps some aspect of the NN just requires the kind of min-max ranges that lead to a multiplier being > 1.
   - (good reason) perhaps the output activations use 16bit, where as the input activations and weights use 8bit.  In that case, the output scale will typically be 256x larger, so the multiplier also will be 256x larger.  This sometimes push multipliers over 1.  This 8bit x 8bit -> 16bit is commonly found in quantized LSTMs.
   - (bad reason) perhaps the NN was improperly quantized, with poorly chosen minmax or without suitable fine-tuning or retraining. In that case, multiplier>1 is merely a symptom of poor quantization.
   - (good reason) as a variant of the previous bad reason:  even in a good retraining-for-quantization setup, in the early stages of the training process, ranges will not yet have converged enough to good values, so multipliers will often be >1 there even if at later learning steps they become < 1.

Hope this helps,
Benoit
 


e.g. let's look at weights and activations:

Weights:
 - several papers/sources on the web indicate that trained-weights are typically real-numbers in the range (-1,1)
   (empirical results)
 - however, this is empirical data from existing networks; Q: will this always hold true?

Activation:
 Intermediate Layer:
  - intermediate layers (of a network) are constrained to a smaller range
    (RELU / RELU6 / sigmoid etc applied on the previous layer's output)
 - so, it seems like we're good here?

 Input Layer:
   - for the input layer, the activation may be large values (e.g. raw audio/video data)
   - Q: The assumption that activation values are in a narrow range may not hold for this layer, right?
          e.g. raw pixel values can take values from 0-255
==
Any comments/thoughts ..?

Thanks in advance.


** I am specifically referring to  equation 7 from the gemllowp quantization-documentation:
result_quantized_value = result_zero_point +
    (lhs_scale * rhs_scale / result_scale) * int32_accumulator       (7)

The difficulty here is of course that (lhs_scale * rhs_scale / result_scale) is a positive real number, not an integer in general.

It is a constant, though. So what we have to implement here is the (approximate) scaling of a int32 value by some arbitrary positive constant multiplier.


Moreover, it is safe to assume that this positive constant multiplier is smaller than one — each of the scale values here is typically smaller than one, as we are typically mapping the [0..255] quantized uint8 value range to an interval of real values that is much narrower than that, typically within [-10,10] in most neural networks. For example, a neural network using Relu6 activation functions will typically have real activation values in the interval [0,6].

== 

--
You received this message because you are subscribed to the Google Groups "gemmlowp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gemmlowp+u...@googlegroups.com.
To post to this group, send email to gemm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gemmlowp/0ff7c057-8b43-48a3-97c5-5e7275e3a82d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages