[PSA] New post-training quantization backend

瀏覽次數：152 次

跳到第一則未讀訊息

Feng Liu

未讀,

2021年3月19日中午12:38:152021/3/19

收件者：TensorFlow Lite

Hi everyone,

We’re excited to announce a new MLIR-based calibrated post-training quantization backend, available in the latest nightly builds. We’re still working on improving the project, but want to highlight this feature now in anticipation of the stable TF 2.5 release.

Benefits

Better model performance and accuracy: We add some canonicalization optimizations to the new quantization conversion passes to continuously simplify the quantized models and improve the performance and accuracy by removing redundant computations and rescalings.
Better test coverage: We have >100 model tests to verify that the quantization parameters are propagated correctly across tensors and constants. Some long-standing edge-issues (such as shared weight/bias tensors) in the old backend are fixed.
TensorFlow node name and Python code tracking during quantization: We now expose these during conversion tracking when errors happen.
Unified execution path for different quantization workflows: From now on, both quantization-aware training and post-training calibrated quantization share the same backend code. It helps to achieve correct model numericals from different workflows.
Single source of truth for op properties: We consolidate all the quantization related op properties and specify them in the same file of TFLite op definition for better management.

How to use the new backend

The feature has been enabled by default in tf-nightly pip, and is active when using calibrated post-training quantization, as in the following example:

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)

converter.optimizations = [lite.Optimize.DEFAULT]

Converter.representative_dataset = calibration_gen

converter.target_spec.supported_ops = [lite.OpsSet.TFLITE_BUILTINS_INT8]

tflite_model = converter.convert()

If you encounter any issues and need the legacy backend, it's an easy 1-line change to opt out by setting experimental_new_quantizer flag to false in the TFLiteConverter:

converter.experimental_new_quantizer = False # Add this line

Known issues and limitations

int16 activations quantization: It will reroute the conversion to the old logic if you are using 16x8 quantization mode.

Feedback

Please submit bugs by creating github issues and add the label “comp:lite”. Please include:

Command used to run the converter or code if you’re using Python API
The output from the converter invocation
The input model to the converter
If the conversion is successful, but the generated model is wrong, state what is wrong:

Producing wrong results or there is a decrease in accuracy
Producing correct results but the model is slower/larger than expected (model generated from old converter)

Timeline

This feature has been enabled in the tf-nightly pip and it will be the default in the release of TensorFlow 2.5.

Thanks,

Feng Liu, on behalf of TFLite team

回覆所有人

回覆作者

轉寄

0 則新訊息