Hi everyone,
We’re excited to announce a new MLIR-based calibrated post-training quantization backend, available in the latest nightly builds. We’re still working on improving the project, but want to highlight this feature now in anticipation of the stable TF 2.5 release.
Better model performance and accuracy: We add some canonicalization optimizations to the new quantization conversion passes to continuously simplify the quantized models and improve the performance and accuracy by removing redundant computations and rescalings.
Better test coverage: We have >100 model tests to verify that the quantization parameters are propagated correctly across tensors and constants. Some long-standing edge-issues (such as shared weight/bias tensors) in the old backend are fixed.
TensorFlow node name and Python code tracking during quantization: We now expose these during conversion tracking when errors happen.
Unified execution path for different quantization workflows: From now on, both quantization-aware training and post-training calibrated quantization share the same backend code. It helps to achieve correct model numericals from different workflows.
Single source of truth for op properties: We consolidate all the quantization related op properties and specify them in the same file of TFLite op definition for better management.
The feature has been enabled by default in tf-nightly pip, and is active when using calibrated post-training quantization, as in the following example:
If you encounter any issues and need the legacy backend, it's an easy 1-line change to opt out by setting experimental_new_quantizer flag to false in the TFLiteConverter:
Please submit bugs by creating github issues and add the label “comp:lite”. Please include:
Command used to run the converter or code if you’re using Python API
The output from the converter invocation
The input model to the converter
If the conversion is successful, but the generated model is wrong, state what is wrong:
Producing wrong results or there is a decrease in accuracy
Producing correct results but the model is slower/larger than expected (model generated from old converter)
This feature has been enabled in the tf-nightly pip and it will be the default in the release of TensorFlow 2.5.
Thanks,
Feng Liu, on behalf of TFLite team