NeuroPilotis a collection of software tools and APIs that allow users to create AI applications, based on neural network models, for MediaTek hardware platforms. With NeuroPilot, users can develop and deploy AI applications on edge devices with extremely high efficiency, while also keeping data private.
NeuroPilot Entry Points, shown in Figure 1 - NeuroPilot Architecture in red, represent the different AI model software integration entry points that developers can choose from. See section: NeuroPilot Entry Points.
5b. (Required for Offline Compile) Offline model compilation, performance evaluation, and optimization (Neuron SDK): Compile the model to DLA format using Neuron Compiler (ncc-tflite), and then evaluate performance on a real device using using Neuron Runtime Profiler. See NeuronProfiler. Users can also perform additional optimization workflows, such as TCM, GNO, and Compiler Custom API, using Neuron SDK. See section: Developer Tools ->Model Development ->Neuron SDK.
This example uses the MobileNetV1 neural network. This neural network model is an image classifier network that is widely used on mobile devices. This network takes images at 224x224 resolution, and classifies each of them into one of 1000 different classes.
The objective of this example is to take a trained MobileNetV1 model and produce a .tflite model that is ready to use on a MediaTek Android device. The below steps show how to convert the MobileNetV1 model from TensorFlow to TFLite.
In order to perform this conversion, the user must know which tensors in the network are the inputs and outputs. Because the MobileNetV1 network is a public reference model, we can specify the tensor names directly in this example.
This example shows how to take a quantized network model and produce a model optimized for MediaTek devices. This is often convenient when a quantized model is already available. Some quantized models are tuned to a very high accuracy using many re-training iterations, which may require lots of time and compute resources. Starting with this kind of model may yield better final accuracy results. In this mode, the input file needs to be in the protobuf format (.pb).
In order to run neural network models on an Android device, the network model must be prepared according to the tutorial shown in the 2.3.1. Neural Network Model Creation section. Please read and understand that tutorial before proceeding.
The following sample Java application is a simple timed benchmark for running a MobileNet image classifier model. The sample application follows the Java Native Application development flow described in 2.2.3.1. Android Development. The Android project includes a sample image, which is used as an input to the network. The application invokes the neural network Interpreter, receives the output classification, and reports the latency of the image inference.
The following section describes the major steps of the sample app, to help explain the process of invoking neural network models in Android. The code below can be found in the file app/src/main/java/com/mediatek/nn/benchmark/NNTestBase.java .
Floating-point models require 4 bytes per (color) channel, so the size of the input buffer is larger if the model uses floating point values. Integer models require only 1 byte per channel, and consequently require less memory to store input images.
When the interpreter runs, it produces an Array as output. This output array is a set of class probabilities that indicated how likely each possible classification is, based on the network evaluation. The class with the highest probability is the class that is reported for the image.
In order to run neural network models on an Android device, the network model must be prepared according to the tutorial shown in the 2.3.1. Neural Network Model Creation section. Please read and understand that tutorial before proceeding to this tutorial.
In general, nearly all .tflite models will run on Android devices. However, some types of operations in the neural network model may cause large differences in run time speed, due to special-cases of operation support, both from the Android version itself, and operation support for a given device. Consult the NeuroPilot Introduction and Platform Specification -> 2. Hardware Support Specification section for more details on device capabilities and operation support.
The most effective way of getting top performance on Android devices is to develop applications using the Android NDK. In this native method of development, users write the app in C++, and call APIs provided by the NDK. NNAPI is one of these APIs, and there is also a TFLite C++ API as well. This method still provides all the run-time control of the TFLite interpreter, but yields smaller and more compact applications that can be highly tuned for performance.
We provide a sample native application for reference. This application is based on the MobileNetSSD neural network. This network is an Object Detection network, which takes images as inputs and computes the presence of known objects in the image. These object detection networks can identify an arbitrary number of objects in any single image, including objects that may overlap one another visually. the output of this network is a series of bounding-boxes which identify which region of the input image an object lies, and what is the classification of each object found.
To aid the development of native app code, Mediatek provides a shim API which makes code development easier and faster. This shim layer will invoke any required NeuroPilot libraries as well as the TFLite interpreter. The example shown here uses this shim layer.
In this sample application, input images are pre-processed into binary files via the script 0_python_convert_input_2_bin.bat. This script generates binary files which can be directly copied into the neural network input tensor.
The entire output processing code in the app is too large to reprint here. Please refer to the ssd_post_process() function inside ssd.cpp for details. This example app follows the common NMS implementation, which as many references online.
Converter Tool can convert models from different deep learning training frameworks into a format that can be deployed on MediaTek platforms. Converter Tool handles the variations of both the operator definitions and model representations among different training frameworks, and provides device-independent optimizations to the given model.
In this section, we provide a detailed introduction and some examples of using Converter Tool. Currently, Converter Tool supports TensorFlow v1, TensorFlow v2, PyTorch, and Caffe as the conversion source, and TensorFlow Lite as the conversion target. Converter Tool is also capable of quantizing the model with different configurations, such as 8-bit asymmetric quantization, 16-bit symmetric quantization, or mixed-bit quantization. Post-training quantization can be applied during the conversion process if necessary.
Converter Tool can convert models from different deep learning training frameworks into a format that can be deployed on MediaTek platforms. Converter Tool handles the operator definition variations among different deep learning training frameworks, and can also quantize a floating-point model into an integer-only representation. Users can pass a quantization-aware training result to Converter Tool, configure Converter Tool to do post-training quantization, or apply both of these techniques together.
Certain operators are not included as primitive operators in some training frameworks. In this case, a typical workaround is to use multiple primitive operators to composite the missing operator. Typically, there are a large numbers of ways (or patterns) to composite the same missing operator. Each pattern gives a different runtime performance after deploying the model on a MediaTek platform.
When using the NeuroPilot Converter Tool, users can apply quantization-aware training and post-training quantization together. For example, use quantization-aware training first. Then use post-training quantization to deduce the quantization range of the tensors that were missed by the quantization-aware training tool, or for tensors created during the conversion phase.
Converter Tool provides an easy way to do post-training quantization during the conversion process. To do post-training quantization, users only need to prepare a representative dataset for Converter Tool, in order to calibrate the quantization value range for the activation tensors in the model. Converter Tool computes the exponential moving average minimum and maximum range over all the batches in the given dataset, and then uses that as the quantization value ranges of the tensors. Converter Tool also provides many configuration options when doing post-training quantization, including the quantization bitwidth, and asymmetric/symmetric settings.
The quantization value range, meaning the minimum and maximum value, affects how floating-point values are approximated to integer values in a quantized tensor. The quantization value range can be deduced from multiple sources based on the following precedence, from highest to lowest.
During the conversion process, Converter Tool converts the quantization value range information (i.e. minimum and maximum values) to the zero_point and scale representation. For this reason, these quantization value ranges are typically nudged by a small amount, to ensure that zero_point exists and is an integer value.
If the quantization bitwidth does not exactly match the bitwidth of the resulting data type, Converter Tool will expand the quantization value range in order to keep the scale the same in the resulting model.
Converter Tool provides two options, use_weights_symmetric_quantization and use_symmetric_quantization. These two options determine whether to use symmetric quantization ranges for weight and activation tensors that have their quantization ranges deduced from post-training quantization. These two converter options do not affect the quantization ranges deduced from the FakeQuantize operators.
3a8082e126