Khadas VIM3 Pro (with Amlogic 5 TOPS NPU)

Fastel CTO

unread,

Apr 14, 2022, 2:32:09 PM4/14/22

to doubango-ai

Hello! This ARM board has an 5 TOPS onboard NPU, but seems UltANPR doesn't use it with integrated tensorflow lite. Tested benchmark with loaded and unloaded NPU kernel modules - results are the same.

Is there any aarch64 build with full TF support that dont require CUDA?

Mamadou DIOP

unread,

Apr 14, 2022, 3:11:59 PM4/14/22

to Fastel CTO, doubango-ai

Hello,

I got this question very often. Next a copy/paste my answer to the last person asking it via email:

{{{

Hello,

Thanks for your interest in our product. Our models are quantized (INT8) but we only run them on CPU. The NPU functions using Tensorflow-Lite NNAPI delegates are disabled as most of the time it doesn't run faster. Checkhttps://www.doubango.org/SDKs/anpr/docs/Improving_the_speed.html#mobiles

The pre-, post- processing functions, computer vision methods... are all written in assembler (SIMD NEON) and heavily multi-threaded. The ASM code is open source (https://github.com/DoubangoTelecom/compv). For example, the post-processing functions like NMS (Non Maxima Suppression) is done outside Tensorflow-Lite and entirely written in ASM. Such function will poorly run on GPU or NPU. Same applies to some pre-processing functions (https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/Jetson.md#pre-processing-operations) included in Tensorflow-Lite that involves a lot of memory access and little computation. We're still not fully using batching which means we could boost the current fps by 2 to 3 times. The NPU included in your processor cannot run the deep learning models at half of that target speed and cannot run all models at the same time (detection, ocr, lpci, vcr, vmmr, vbsr...).

The best solution we may have is using the CPU and NPU in parallel like what is done today for x86 platforms (CPU for detection and GPU for OCR in parallel) -> check comment athttps://www.doubango.org/SDKs/anpr/docs/Benchmark.html#amd-ryzen-7-3700x-8-core-cpu-with-rtx-3060-gpu-untuntu-20

Regards,

}}}

On 14 Apr 2022, at 20:32, Fastel CTO <faste...@gmail.com> wrote:

Hello! This ARM board has an 5 TOPS onboard NPU, but seems UltANPR doesn't use it with integrated tensorflow lite. Tested benchmark with loaded and unloaded NPU kernel modules - results are the same.
Is there any aarch64 build with full TF support that dont require CUDA?

--
You received this message because you are subscribed to the Google Groups "doubango-ai" group.
To unsubscribe from this group and stop receiving emails from it, send an email to doubango-ai...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/doubango-ai/1b8dfd6f-401e-4a34-8d30-4751293e3128n%40googlegroups.com.

Mamadou DIOP

unread,

Apr 15, 2022, 2:51:57 AM4/15/22

to Fastel CTO, doubango-ai

I have ordered a "khadas vim3" device and will check the performance. You can star https://github.com/DoubangoTelecom/ultimateALPR-SDK/issues/248 to get notified

Sent from my Galaxy

To view this discussion on the web visit https://groups.google.com/d/msgid/doubango-ai/E0E260E6-A3B3-49C0-9E64-216929E45127%40doubango.org.

Reply all

Reply to author

Forward