I am interested in building a library for new hardware that is compatible with quantized TensorFlow/gemmlowp.
Some of the quantization code differs from the TensorFlow quantization code. Should we always assume gemmlowp has the newest/best practices for deep learning quantization?
For instance, during requantization of accumulator values TensorFlow appears to round-half-up: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/core/kernels/quantization_utils.h#L263
While gemmlowp appears to round away from zero: https://github.com/google/gemmlowp/blob/master/fixedpoint/fixedpoint.h#L263
Also, can you comment more generally about success with 8bit quantization in practice?
In a July GitHub thread (https://github.com/tensorflow/models/issues/1879#issuecomment-314152798), Pete Warden said
“We haven't tried quantization with this model, and I suspect that ResNet-style architectures may not tolerate quantization well from an accuracy standpoint (since they're so deep)”.
Can you comment on any success with gemmlowp and large high-accuracy models? Is the above concern about accuracy just because it wasn’t trained using FakeQuant operations? I know in a thread from August (https://groups.google.com/forum/#!topic/gemmlowp/LS4Q-mwxoqw) you said
“In practice, retraining for quantization has been key to achieving low enough accuracy degradation in order for quantization to be shippable in applications that I know about.”
So is gemmlowp only targeted at smaller “mobile-friendly” models like MobileNet or is it intended to be used for larger, more expensive/accurate models as well?
Thanks!
I am interested in building a library for new hardware that is compatible with quantized TensorFlow/gemmlowp.
Some of the quantization code differs from the TensorFlow quantization code. Should we always assume gemmlowp has the newest/best practices for deep learning quantization?
For instance, during requantization of accumulator values TensorFlow appears to round-half-up: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/core/kernels/quantization_utils.h#L263
While gemmlowp appears to round away from zero: https://github.com/google/gemmlowp/blob/master/fixedpoint/fixedpoint.h#L263
Also, can you comment more generally about success with 8bit quantization in practice?
In a July GitHub thread (https://github.com/tensorflow/models/issues/1879#issuecomment-314152798), Pete Warden said
“We haven't tried quantization with this model, and I suspect that ResNet-style architectures may not tolerate quantization well from an accuracy standpoint (since they're so deep)”.Can you comment on any success with gemmlowp and large high-accuracy models? Is the above concern about accuracy just because it wasn’t trained using FakeQuant operations? I know in a thread from August (https://groups.google.com/forum/#!topic/gemmlowp/LS4Q-mwxoqw) you said
“In practice, retraining for quantization has been key to achieving low enough accuracy degradation in order for quantization to be shippable in applications that I know about.”So is gemmlowp only targeted at smaller “mobile-friendly” models like MobileNet or is it intended to be used for larger, more expensive/accurate models as well?
Thanks!
--
You received this message because you are subscribed to the Google Groups "gemmlowp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gemmlowp+unsubscribe@googlegroups.com.
To post to this group, send email to gemm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gemmlowp/5f47a5db-4aa3-4040-abff-fd8673a79e9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.