78 views

Skip to first unread message

Mike Bedford

unread,

Mar 1, 2024, 4:46:12 AMMar 1

to Mamadou DIOP, doubango-ai

Hello Mamadou! Hope you are well. We have a performance question and I know how much you love answering performance questions! haha.

As you know, we use your product on many devices and we love it, and the performance. However, we typically build the machines ourselves to get the most performance we can by leveraging NVidia RTX4070 GPUs, Intel i7 or i9 processors, etc... and we have been quite successful getting 200 and in most cases, 300 or more fps.

However, we wanted to see if we can get good performance from a supplied server from a manufacturer, instead of building our own. After checking HP, Dell, etc.. the best we found for GPU options and in a price point that we can handle in our budget (ie: not $10,000 per server) was Supermicro 1U WIO SuperServer. We ordered one to test with these specs:

Intel Xeon E-2468 8 Core/2.6Ghz/24MB cache

32GB DDR5 4800 ECC

NVidia Quadro RTX A4000

While not exactly the same as an RTX-4070 and an i9/i7, we were hoping that we could at least get close to our current fps with this machine. Maybe not 300 or more fps but we would be happy with 200 or more.

What we are finding is we only get 180 fps but also of concern is we see a huge CPU usage spike which we don't get with the i7/i9 option.

We did install the latest NVidia drivers, CUDA, cuDNN and OpenVino toolkit.

The three general questions are:

1. With this hardware, the specs on paper, should we get the performance we are hoping for in your best estimation?

2. Anything you can think of to optimize for this hardware to squeeze the most out of it? For example, certain drivers to install or config UltimateALPR to run OpenVino or maybe TRT or TF on the CUDA cores?

3. Maybe the cause of the issue but we are seeing an error in the benchmark log. I have pasted the log below. Note the section with the issue clGetPlatformIDs:

*[ULTALPR_SDK INFO]: Starting benchmark...
*[COMPV INFO]: [UltAlprSdkEngine] Call: ultimateAlprSdk::UltAlprSdkEngine::init
*[COMPV INFO]: [UltAlprSdkEngine] jsonConfig: {"debug_level": "info","debug_write_input_image_enabled": false,"debug_internal_data_path": ".","gpgpu_enabled": true,"max_latency": -1,"klass_vcr_gamma": 1.5,"detect_roi": [0, 0, 0, 0],"detect_minscore": 0.1,"pyramidal_search_enabled": false,"pyramidal_search_sensitivity": 0.28,"pyramidal_search_minscore": 0.8,"pyramidal_search_min_image_size_inpixels": 800,"recogn_minscore": 0.3,"recogn_score_type": "min","assets_folder": "assets","charset": "latin","num_threads": -1,"recogn_rectify_enabled": false,"ienv_enabled": false,"openvino_enabled": true,"openvino_device": "CPU","npu_enabled": true,"asm_enabled": true,"intrin_enabled": true,"klass_lpci_enabled": false,"klass_vcr_enabled": false,"klass_vmmr_enabled": false,"klass_vbsr_enabled": false}
*[COMPV INFO]: /!\ Code in file 'source\ultimate_alpr_sdk_public_engine.cxx' in function 'ultimateAlprSdk::UltAlprSdkEngine::init' starting at line #77: Not optimized -> Code not built for mobile devices but for clouds. Are you sure this is what you want?
*[COMPV INFO]: [UltAlprSdkEngine] **** Copyright (C) 2011-2021 Doubango Telecom <https://www.doubango.org> ****
ultimateALPR-SDK <https://github.com/DoubangoTelecom/ultimateALPR-SDK> version 3.9.0

*[COMPV INFO]: [CompVBase] Initializing [base] modules (v 1.0.0, nt -1)...
*[COMPV INFO]: [CompVBase] sizeof(compv_scalar_t)= #8
*[COMPV INFO]: [CompVBase] sizeof(float)= #4
*[COMPV INFO]: [CompVBase] Windows dwMajorVersion=6, dwMinorVersion=2
*[COMPV INFO]: Initializing window registery
*[COMPV INFO]: [ImageDecoder] Initializing image decoder...
*[COMPV INFO]: [CompVCpu] H: 'GenuineIntel', S: '', M: '', MN: ''
*[COMPV INFO]: [CompVBase] CPU features: (intel);[x86];[x64];mmx;sse;sse2;sse3;ssse3;sse41;sse42;avx;avx2;fma3;erms;bmi1;bmi2;popcnt;cmov;aes;rdrand;
*[COMPV INFO]: [CompVBase] CPU cores: online=#16, conf=#16
*[COMPV INFO]: [CompVBase] CPU cache1: line size: #64B, size :#48KB
*[COMPV INFO]: [CompVBase] CPU Phys RAM size: #32628GB
*[COMPV INFO]: [CompVBase] CPU endianness: LITTLE
*[COMPV INFO]: [CompVBase] Binary type: X86_64
*[COMPV INFO]: [CompVBase] Intrinsic enabled
*[COMPV INFO]: [CompVBase] Assembler enabled
*[COMPV INFO]: [CompVBase] OS name: Windows
*[COMPV INFO]: [CompVBase] Math Fast Trig.: true
*[COMPV INFO]: [CompVBase] Math Fixed Point: true
*[COMPV INFO]: [CompVMathExp] Init
*[COMPV INFO]: [CompVBase] Default alignment: #64
*[COMPV INFO]: [CompVBase] Best alignment: #64
*[COMPV INFO]: [CompVBase] Heap limit: #1670563KB (#1631MB)
*[COMPV INFO]: [CompVParallel] Initializing [parallel] module...
*[COMPV INFO]: /!\ Code in file 'compv_mem.cxx' in function 'compv::CompVMemZero_C' starting at line #508: Not optimized -> No SIMD implementation found
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=0,set=useless, threadId:0000000000002BB8, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVThreadDispatcher] Thread dispatcher created with #16 threads/#16 cores
*[COMPV INFO]: [CompVParallel] [Parallel] module initialized
*[COMPV INFO]: [CompVBase] [Base] modules initialized
*[COMPV INFO]: [CompVCore] Initializing [core] module (v 1.0.0)...
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=3,set=useless, threadId:0000000000003330, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=4,set=useless, threadId:0000000000002924, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=6,set=useless, threadId:0000000000003A70, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=1,set=useless, threadId:0000000000003468, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 1 and name = 'FAST (Features from Accelerated Segment Test)'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 8 and name = 'ORB (Oriented FAST and Rotated BRIEF)'...
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=8,set=useless, threadId:0000000000003574, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=10,set=useless, threadId:000000000000117C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=14,set=useless, threadId:0000000000003570, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=15,set=useless, threadId:00000000000020E4, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=12,set=useless, threadId:00000000000037E4, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 27 and name = 'Sobel edge detector'...
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=11,set=useless, threadId:0000000000002DDC, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=9,set=useless, threadId:0000000000003A34, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=2,set=useless, threadId:0000000000000E74, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=7,set=useless, threadId:0000000000002BBC, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=5,set=useless, threadId:0000000000003994, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 28 and name = 'Scharr edge detector'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 29 and name = 'Prewitt edge detector'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 20 and name = 'Canny edge detector'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 30 and name = 'Hough standard (STD)'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 31 and name = 'Kernel-based Hough transform (KHT)'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 41 and name = 'Standard Histogram of oriented gradients (S-HOG)'...
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=13,set=useless, threadId:0000000000002668, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVMatcher] Registering matcher factory with id = 0 and name = 'Brute force matcher'...
*[COMPV INFO]: [CompVConnectedComponentLabeling] Registering connected component labeling factory with id = 1 and name = 'PLSL (Parallel Light Speed Labeling)'...
*[COMPV INFO]: [CompVConnectedComponentLabeling] Registering connected component labeling factory with id = 19 and name = 'LMSER (Linear Time Maximally Stable Extremal Regions)'...
*[COMPV INFO]: [CompVGL] Initializing [gl] module (v 1.0.0)...
*[COMPV INFO]: [CompVGL] GL module initialized
*[COMPV INFO]: [CompVGpu] Initializing [gpu] module (v 1.0.0)...
*[COMPV INFO]: [CompVCamera] Initializing [camera] module (v 1.0.0)...
*[COMPV INFO]: [CompVCamera] Camera plugin path: C:\Program Files\iSpy\Plugins\LPR\CompVPluginMFoundation.dll
*[COMPV INFO]: [CompVDrawing] Initializing [drawing] module (v 1.0.0)...
*[COMPV INFO]: [CompVDrawing] /!\ No jpeg decoder found
*[COMPV INFO]: [CompVDrawing] Drawing module initialized
*[COMPV INFO]: [CompVGpu] GPU enabled: true
*[COMPV INFO]: /!\ Code in file 'source\ultimate_base_engine.cxx' in function 'ultimateBase::UltBaseEngine::init' starting at line #82: Not optimized for GPU -> GPGPU computing not enabled or deactivated
*[COMPV INFO]: [UltBaseOpenCL] Trying to load [OpenCL.dll]
*[COMPV INFO]: [CompVSharedLib] Loading sharded library from OpenCL.dll
*[COMPV INFO]: [UltBaseOpenCL] Loaded [OpenCL.dll], looksLikeValid: yes...
***[COMPV ERROR]: function: "ultimateBase::UltBaseOpenCLUtils::init()"
file: "source\opencl\ultimate_base_opencl_utils.cxx"
line: "48"
message: [UltBaseOpenCLUtils] OpenCL operation failed (-1001 -> unknown error code) -> clGetPlatformIDs failed
***[COMPV ERROR]: function: "ultimateBase::UltBaseOpenCLUtils::init()"
file: "source\opencl\ultimate_base_opencl_utils.cxx"
line: "106"
message: Operation Failed (COMPV_ERROR_CODE_E_UNKNOWN) ->
*[COMPV INFO]: [UltBaseOpenCL] Failed to hook functions using [OpenCL.dll] library
*[COMPV INFO]: [UltOcrEngine] Tensorflow version: 2.6.0
*[COMPV INFO]: [UltAlprPlugin] Loading plugin: C:\Program Files\iSpy\Plugins\LPR\ultimatePluginOpenVino.dll ...
*[COMPV INFO]: [CompVSharedLib] Loading sharded library from C:\Program Files\iSpy\Plugins\LPR\ultimatePluginOpenVino.dll
*[PLUGIN_OPENVINO INFO]: DLL_PROCESS_ATTACH
*[COMPV INFO]: [CompVSharedLib] Loaded shared lib: C:\Program Files\iSpy\Plugins\LPR\ultimatePluginOpenVino.dll
*[PLUGIN_OPENVINO INFO]: pluginOpenVinoInferenceEngine::init called
*[COMPV INFO]: [UltAlprSdkEnginePrivate] **** Copyright (C) 2011-2022 Doubango Telecom <https://www.doubango.org> ****
You're using an unlicensed version of ultimateALPR-SDK <https://github.com/DoubangoTelecom/ultimateALPR-SDK>
without the rights to include the SDK in any form of commercial product.
*[COMPV INFO]: [UltAlprSdkEnginePrivate] IC took 127 millis
*[COMPV INFO]: [CompVCpu] Enabling asm code
*[COMPV INFO]: [CompVCpu] Enabling intrinsic code
*[COMPV INFO]: [UltAlprSdkEnginePrivate] recogn_tf_num_threads: 16, acceleration backend: OpenVino
*[COMPV INFO]: [CompVThreadDispatcher] Not optimized -> Your system have #16 cores but you're only using #8. Sad!!
*[COMPV INFO]: [CompVThreadDispatcher] Thread dispatcher created with #8 threads/#16 cores
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=3,set=useless, threadId:0000000000003734, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=0,set=useless, threadId:0000000000002AD0, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=4,set=useless, threadId:00000000000035D0, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=6,set=useless, threadId:0000000000003B14, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=5,set=useless, threadId:0000000000003B1C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=2,set=useless, threadId:000000000000323C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=7,set=useless, threadId:000000000000183C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=1,set=useless, threadId:0000000000003210, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%
*[PLUGIN_OPENVINO INFO]: [7] OpenVINO inference engine number: 2020.3.0-3467-15f2c61a-releases/2020/3, version: 2.1
*[COMPV INFO]: [UltAlprDetector] We have managed to create assets/models/ultimateALPR-SDK_detect_main.desktop.model.doubango detector/classifier for OpenVino. Ignoring Tensorflow detector/classifier.
*[COMPV INFO]: [UltAlprSdkEnginePrivate] *** Entering parallel process for job #1 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate] *** Entering parallel process for job #5 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate] *** Entering parallel process for job #6 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate] *** Entering parallel delivery ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate] *** Entering parallel process for job #4 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate] *** Entering parallel process for job #3 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate] *** Entering parallel process for job #2 ***
*[COMPV INFO]: [UltAlprSdkEngine] Call: ultimateAlprSdk::UltAlprSdkEngine::warmUp
*[COMPV INFO]: [UltAlprSdkEnginePrivate] *** Entering parallel process for job #0 ***
*[COMPV INFO]: /!\ Code in file 'source\ultimate_alpr_plugin.cxx' in function 'ultimateAlpr::UltAlprPlugin::process' starting at line #165: Not optimized -> Batching will not be activated for this function
*[COMPV INFO]: /!\ Code in file 'compv_mem.cxx' in function 'compv::CompVMemCopy_C' starting at line #985: Not optimized -> No SIMD implementation found. On ARM consider http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html
*[COMPV INFO]: /!\ Code in file 'intrin\x86\compv_mem_intrin_ssse3.cxx' in function 'compv::CompVMemUnpack3_SrcPtrNotAligned_Intrin_SSSE3' starting at line #69: Not built with SSSE3 support
*[COMPV INFO]: /!\ Code in file 'source\ultimate_text_fuser.cxx' in function 'ultimateText::UltTextFuser::process' starting at line #189: Is for testing and must not be called -> Fragments should be trimmed
*[COMPV INFO]: /!\ Code in file 'math\compv_math_matrix.cxx' in function 'compv::CompVMatrix::mulAtA' starting at line #881: Contains a TODO: -> Deprecated: use CompVMath::mulAB
*[COMPV INFO]: /!\ Code in file 'math\compv_math_matrix.cxx' in function 'compv::CompVMatrixGeneric<double>::transpose' starting at line #619: Not optimized -> No MT implementation could be found
*[COMPV INFO]: /!\ Code in file 'source\ultimate_text_slant.cxx' in function 'ultimateText::UltTextSlant::applyTransformation' starting at line #90: Not optimized -> Bundle homogenous transformation + transpose + mulABt + homogeneousToCartesian2D
*[COMPV INFO]: /!\ Code in file 'math\compv_math_matrix.cxx' in function 'compv::CompVMatrixGeneric<float>::transpose' starting at line #619: Not optimized -> No MT implementation could be found
*[COMPV INFO]: /!\ Code in file 'math\compv_math_transform.cxx' in function 'compv::CompVMathTransformGeneric<float>::homogeneousToCartesian2D' starting at line #98: Not optimized -> No SIMD or GPU implementation found
*[COMPV INFO]: /!\ Code in file 'math\compv_math_matrix.cxx' in function 'compv::CompVMatrixGeneric<float>::invA3x3' starting at line #515: Not optimized -> No SIMD or GPU implementation found.
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 1): {"frame_id":3,"job_idx":1,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 2): {"frame_id":9,"job_idx":5,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 3): {"frame_id":21,"job_idx":6,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 4): {"frame_id":25,"job_idx":4,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 5): {"frame_id":29,"job_idx":3,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 6): {"frame_id":31,"job_idx":2,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 7): {"frame_id":40,"job_idx":0,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 8): {"frame_id":43,"job_idx":1,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 9): {"frame_id":46,"job_idx":5,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 10): {"frame_id":49,"job_idx":6,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 11): {"frame_id":52,"job_idx":4,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 12): {"frame_id":53,"job_idx":3,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 13): {"frame_id":57,"job_idx":2,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 14): {"frame_id":70,"job_idx":0,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 15): {"frame_id":74,"job_idx":1,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 16): {"frame_id":75,"job_idx":5,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 17): {"frame_id":82,"job_idx":6,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 18): {"frame_id":86,"job_idx":4,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 19): {"frame_id":92,"job_idx":3,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: MyUltAlprSdkParallelDeliveryCallback::onNewResult(0, OK, 20): {"frame_id":95,"job_idx":2,"latency":0,"plates":[{"car":{"confidence":100.0,"warpedBox":[83.73318,159.6842,1089.547,159.6842,1089.547,614.2037,83.73318,614.2037]},"confidences":[89.59459,61.54897,90.29333,90.05127,90.1035,89.59459,90.1434,90.55502,89.76238],"text":"3PEDLM*","warpedBox":[821.9879,336.5989,918.3691,336.5989,918.3691,401.3843,821.9879,401.3843]}]}
*[ULTALPR_SDK INFO]: Elapsed time (ALPR) = [[[ 545.592200 millis ]]]
*[ULTALPR_SDK INFO]: result: {"duration":3,"frame_id":99,"latency":0}
*[ULTALPR_SDK INFO]: *** elapsedTimeInMillis: 545.592200, estimatedFps: 183.287078 ***

Thanks!!!
--

Mike Bedford

Ensight Technologies, LLC

mi...@ensight-technologies.com

www.ensight-technologies.com

Mamadou DIOP

unread,

Mar 4, 2024, 9:55:30 AMMar 4

to Mike Bedford, doubango-ai

Hi Mike,

To have an estimate in the worse scenario possible you need to run the benchmark app with “—rate 1.0”. Check https://github.com/DoubangoTelecom/ultimateALPR-SDK/tree/master/samples/c%2B%2B/benchmark#testing-usage

When OpenVINO and GPU are correctly configured, we’ll run the detection and classification using OpenVINO and OCR on GPU.

You have 2 different configs, the first diagnostic is to determine which unit (CPU or GPU) is the bottleneck:

A- "rate==0” means detection=on,ocr=off -> this config can be used to benchmark OpenVINO

B- “rate==1” means detection=on,ocr=on -> this config can be used to benchmark both OpenVINO and GPU

C- (A) gives you OpenVINO benchmark, (B minus A) gives you GPU benchmark, (B) gives you benchmark for both when used in parallel

You can also force everything to run on GPU by disabling OpenVINO.

Your logs show OpenCL errors, that’s strange because you should not have them if you had correctly configured your GPU. Disable OpenVINO to force everything on GPU and check the GPU usage using nvidia-smi. Also, check CPU usage when OpenVINO is disabled (must be very low).

200 fps means 5ms per image, that’s almost impossible: Pre-processing(image loading, chroma conversion, image resize, memory transfer[CPU to/from GPU], JSON parsing…) alone will take more than 2 to 3ms. That’s without doing any ANPR operation.

Summary:

- Set rate==1.0 and share logs with OpenVINO on/off

Mike Bedford

unread,

Mar 4, 2024, 11:06:02 AMMar 4

to Mamadou DIOP, doubango-ai

Thank you for the response and assistance. I have done two benchmarks, one with rate 1.0 and the other with rate 0.0. Both of these log files are attached. What is odd is when benchmarking OpenVino (Rate 0), I don't see any results scrolling but it does show 240fps so it doesn't seem like OpenVino is the bottleneck. Then, when I set rate to 1, it drops to 80fps which shows that GPU is the bottleneck, which is in line with getting GPU/OpenCL errors.

I have also attached nvidia-smi from this machine, showing the gpu.

Also, I have attached screenshots of CPU usage for both rates 1 and 0. One of my other concerns is CPU usage spikes to 100% when benchmark is running, as you can see in the graphs. You can see where it is idle around 20% before and after but spikes to 100% in both cases.

So, to summarize what we need to figure out:

1. Why is the GPU performance bad (OpenCL errors)?

2. Why does the CPU take 100% regardless?

Oh, and by the way, I think reaching 200fps might be impossible but maybe not (maybe your software is better than you think it is! I went in to one of our production systems and ran benchmark (granted, at defaults which is rate 0.2) but as you can see, we get over 200fps!!

Thanks!!

BenchmarkRate0.txt

BenchmarkRate1.txt

BenchmarkCPURate1.PNG

BenchmarkCPURate0.PNG

NvidiaSMI.PNG

Mamadou DIOP

unread,

Mar 4, 2024, 11:41:07 AMMar 4

to Mike Bedford, doubango-ai

1. Why is the GPU performance bad (OpenCL errors)?

You see these errors when OpenCL drivers are missing. These drivers should be present if you have correctly installed all NVIDIA drivers and SDKs (and up to date). We load OpenCL at runtime.

2. Why does the CPU take 100% regardless?

Because you’re using all the CPU cores at the maximum, it’s normal to have your CPU at 100%, specially if you don’t have a GPU. If you want to use less you’ll have to run slower by using less cores: https://www.doubango.org/SDKs/anpr/docs/Configuration_options.html#num-threads.

“What is odd is when benchmarking OpenVino (Rate 0), I don't see any results” -> it’s normal, we only show results from OCR. OpenVINO is still used!

At https://github.com/DoubangoTelecom/ultimateALPR-SDK/tree/master/samples/c%2B%2B/benchmark#peformance-numbers, the first row uses an AMD CPU with RTX3060 GPU:

rate=0, fps=162

rate=1, fps=123

In your case:

rate=0, fps=230

rate=1, fps=81

- You’ll notice that you’re way faster at rate=0 which means your CPU is faster.

- A4000 (https://www.techpowerup.com/gpu-specs/rtx-a4000.c3756) has 19TFLOPS perf, RTX3060 (https://www.techpowerup.com/gpu-specs/geforce-rtx-3060-12-gb.c3682) has 12TFLOPS perf. This means your GPU is faster.

+ So, you have a faster GPU and a faster CPU but you only reach 81fps while we reach 123fps. There is clearly a GPU issue.

- using anaconda or any other env:

install tensorflow: pip install tensorflow-gpu==2.6, as your logs show “*[COMPV INFO]: [UltOcrEngine] Tensorflow version: 2.6.0"

run the timing script for CPU and GPU and attach logs. The script and info at https://groups.google.com/g/doubango-ai/c/OjMC_cb_CXk/m/yu5QnIEiAgAJ

you can also check if GPU is correctly configured: https://stackoverflow.com/a/38019608

On 4 Mar 2024, at 17:05, Mike Bedford <mi...@ensight-technologies.com> wrote:

Thank you for the response and assistance. I have done two benchmarks, one with rate 1.0 and the other with rate 0.0. Both of these log files are attached. What is odd is when benchmarking OpenVino (Rate 0), I don't see any results scrolling but it does show 240fps so it doesn't seem like OpenVino is the bottleneck. Then, when I set rate to 1, it drops to 80fps which shows that GPU is the bottleneck, which is in line with getting GPU/OpenCL errors.

I have also attached nvidia-smi from this machine, showing the gpu.

Also, I have attached screenshots of CPU usage for both rates 1 and 0. One of my other concerns is CPU usage spikes to 100% when benchmark is running, as you can see in the graphs. You can see where it is idle around 20% before and after but spikes to 100% in both cases.

So, to summarize what we need to figure out:
1. Why is the GPU performance bad (OpenCL errors)?
2. Why does the CPU take 100% regardless?

Oh, and by the way, I think reaching 200fps might be impossible but maybe not (maybe your software is better than you think it is! I went in to one of our production systems and ran benchmark (granted, at defaults which is rate 0.2) but as you can see, we get over 200fps!!

<Benchmark200fps.PNG>

Thanks!!

--
You received this message because you are subscribed to the Google Groups "doubango-ai" group.
To unsubscribe from this group and stop receiving emails from it, send an email to doubango-ai...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/doubango-ai/CACDN4G8pi7mMq6Dw9a1XVowBLc43B5GF7P3e-%2BBhR3njEqN_TQ%40mail.gmail.com.
<BenchmarkRate0.txt><BenchmarkRate1.txt><BenchmarkCPURate1.PNG><BenchmarkCPURate0.PNG><NvidiaSMI.PNG>

Mamadou DIOP

unread,

Mar 4, 2024, 11:50:02 AMMar 4

to Mike Bedford, doubango-ai

Additional notes:

- your nvidia-smi output do not have 'benchmark.exe' in the list of processes using the GPU resources

- your nvidia-smi output shows that you have CUDA 12.4. From https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/samples/c++/README.md#migration-to-tensorflow-2x-and-cuda-11x it's noted that Tensorflow 2.6 requires CUDA 11.2. From https://www.tensorflow.org/install/source#gpu, the only Tensorflow version that supports CUDA 12 is Tensorflow 2.15 (the latest)

To view this discussion on the web visit https://groups.google.com/d/msgid/doubango-ai/96EF8EEB-AE2B-4BED-9B27-E6B904B0C38A%40doubango.org.

Mike Bedford

unread,

Mar 4, 2024, 12:51:56 PMMar 4

to Mamadou DIOP, doubango-ai

To be fair, I don't think I had benchmark running anymore when I ran nvidia-smi. But, this is a good reminder that I should see it listed as an application using the GPU resource (when running at the same time AND when GPU is working as it should).

Understood on the CPU usage. It is a 16 core CPU with 8 efficient and 8 performance cores. I think I should just cut the CPU down in benchmark to use only 8 or something instead of the full 16. Thanks for pointing that out.

As for the other things, thank you very much for all of the help and feedback, I will run through everything you suggested and see what happens. Can I at least use CUDA 11.6? It seems like in the PDF for the A4000 GPU, it lists CUDA 11.6 so I am wondering if I must run that at a minimum for the GPU.

Thanks!

Mamadou DIOP

unread,

Mar 4, 2024, 2:14:22 PMMar 4

to Mike Bedford, doubango-ai

On 3/4/2024 6:51 PM, Mike Bedford wrote:

To be fair, I don't think I had benchmark running anymore when I ran nvidia-smi. But, this is a good reminder that I should see it listed as an application using the GPU resource (when running at the same time AND when GPU is working as it should).

The benchmark app will hang at https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/0f8cdaeed3566ea78731465e4a71cd0ac868b89d/samples/c%2B%2B/benchmark/benchmark.cxx#L352 and ask you to press ENTER to exit. The app will be listed as using the GPU as long as you don't press ENTER because the engine still hold CUDA memory. The engine is destroyed at https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/0f8cdaeed3566ea78731465e4a71cd0ac868b89d/samples/c%2B%2B/benchmark/benchmark.cxx#L356

In short: the benchmark app will be listed even after seeing the fps report.

Understood on the CPU usage. It is a 16 core CPU with 8 efficient and 8 performance cores. I think I should just cut the CPU down in benchmark to use only 8 or something instead of the full 16. Thanks for pointing that out.

As for the other things, thank you very much for all of the help and feedback, I will run through everything you suggested and see what happens. Can I at least use CUDA 11.6? It seems like in the PDF for the A4000 GPU, it lists CUDA 11.6 so I am wondering if I must run that at a minimum for the GPU.

Few years ago when I built Tensorflow from source the dependency was based on the major version only which means any 11.x will work. Quote from https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/samples/c++/README.md#migration-to-tensorflow-2x-and-cuda-11x: "Please note that we use CUDA 11.1 instead of 11.2 as suggested at https://www.tensorflow.org/install/source#gpu but both will work." This confirm any 11.x may work.

Mike Bedford

unread,

Mar 13, 2024, 11:17:53 AMMar 13

to Mamadou DIOP, doubango-ai

Hello Mamadou!

I just wanted to circle back on this issue I had email about a little over a week ago. I want to let you know that we found what the culprit was with regard to slightly lower performance but more importantly, 100% CPU usage.

First of all, we did try the setting "num_cores". What was interesting about that setting is when we set it to the full 16 cores or -1 (same thing), we could see on the CPU graph that all 16 cores were near 100% as expected. However, when we set it to just 8 (and we also tried just 1), we could see in the CPU graph that it was indeed only using the cores that we specified in the setting BUT the performance was NOT impacted. We were getting the same estimated fps (within 2-3 fps), regardless of the setting and how many cores it was using.

That being said, the culprit in general was the CPU. As mentioned in a previous email, it is a Xeon E-2468. While it does have 16 cores, it seems to be a budget chip with very few features. For example, on the Intel website, when looking at the specs for the CPU, there are typically "Advanced features" listed, like neural network accelerator and this CPU does not have many of these features. Furthermore, it is not listed as a supported CPU in the OpenVino specs.

We simply swapped the CPU (left everything else the same, RAM, GPU, etc...) to an Intel Core i7 14th gen and voila! We now get the same, if not better, performance than we do with our current systems. Also, it does this while only using 6% CPU instead of 100% (and again, setting num_cores doesn't seem as critical or have an impact with this CPU).

So, while the Xeon looked decent on paper, it had a huge impact on us. The problem is, the Core CPU is not a server grade CPU and typically not offered with most major server manufacturers. In our test, the Core is not an option with the servers we use but the chipset/socket of the motherboard still allowed us to do this test (Xeon E-2468 and Core i7 both share LGA 1700 socket). Therefore, we need to find a Xeon that is comparable and doesn't have the performance hit.

I do see these Xeon Scalable 4th or 5th gen CPUs (that come in bronze, silver, gold and platinum). Gold and Platinum are very expensive but silver still looks like a good CPU. They also are listed as a supported CPU with OpenVino and they do list that they have a deep learning accelerator. Do you have any experience or reports with these CPUs? I hate to just buy hardware and keep trying it to see, that gets expensive. I would prefer someone say this CPU works good for them first, before we buy it.

Thanks!

Mike

Mike Bedford

unread,

Mar 19, 2024, 5:06:42 AMMar 19

to Mamadou DIOP, doubango-ai

Hello again! Just wanted to see if you had a chance to look through my feedback below and give some guidance on the Xeon E versus Xeon Scalable silver or higher. If you have any experience or thoughts, we would appreciate it before we buy another server in hopes it will work this time.

Thanks!!

Mamadou DIOP

unread,

Mar 19, 2024, 9:00:23 PMMar 19

to Mike Bedford, doubango-ai

Hi,

Two possible issues:

1/ in OpenVINO: disable OpenVINO and try to see

2/ gpu/cpu synchronisation issue: disable parallel processing and try to see

You said one of the CPUs is "Xeon E-2468” but only referred to the second as “i7/i9”. You should provide the complete model so that we can compare the SIMD features. Both have AVX2 but not sure if the one you refer to as “i7/i9” has AVX512 or not. Check the logs and look for the line “CPU features: …."

Mamadou DIOP

unread,

Mar 19, 2024, 9:09:34 PMMar 19

to Mamadou DIOP, Mike Bedford, doubango-ai

On 20 Mar 2024, at 02:00, 'Mamadou DIOP' via doubango-ai <douba...@googlegroups.com> wrote:

Hi,

Two possible issues:

1/ in OpenVINO: disable OpenVINO and try to see
2/ gpu/cpu synchronisation issue: disable parallel processing and try to see

I completely forgot you’re using Windows:

For synchro issue check comment at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/blob/65545feccd49106ec165ced2bb58399512ee361c/samples/c%2B%2B/benchmark/main.cxx#L54 with reference to NVIDIA threads at https://devtalk.nvidia.com/default/topic/494659/execute-kernels-without-100-cpu-busy-wait-/

UltimateMRZ was the first project we released. Then, we completely removed “gpgpu_workload_balancing_enabled” from all projects

Quote from https://groups.google.com/g/doubango-ai/c/-FTFZSiPcaA/m/vyo4EpC4AgAJ:

‘’'

When we talk about "CPU resources", it's not only about cycles but also about memory and I/O. Memory access (cache read) has a big impact on performance. This is why we have dedicated an entire section (https://www.doubango.org/SDKs/anpr/docs/Memory_management_design.html) about it and we print the cache size in the logs.
If SDK shares the process with other threads that keep reading and writing to the cache, then this will produce a perf issue (cache eviction, read miss...) - cache pollution-. This will not be reflected by the CPU usage. Also note that memory transfer from the CPU to the GPU isn't only about the GPU bandwidth but also have a huge impact on the CPU (I/O). On x64 using CUDA this is reflected by CPU usage peaks when you're about to transfer the memory, I have only noticed it on Windows. This may explain why you see 50% usage on full GPU. Check the link to NVIDIA forum referenced at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/blob/2ebd6db8170130d17013844c146dc22561d7defd/samples/c%2B%2B/benchmark/main.cxx#L56. On ultimateMRZ (another product) we have an option to mitigate the effect (https://www.doubango.org/SDKs/mrz/docs/Configuration_options.html#gpgpu-workload-balancing-enabled), unfortunately we don't have that option in ultimateALPR.

‘''

To view this discussion on the web visit https://groups.google.com/d/msgid/doubango-ai/6A9DA201-E10F-4C59-A6FE-B2124CE08411%40doubango.org.

Mike Bedford

unread,

Apr 17, 2024, 7:01:15 AMApr 17

to Mamadou DIOP, doubango-ai

Hello again!

I wanted to find this older thread and bring it back up. We think we found a pretty serious issue that might be causing some of our issues/questions about performance with these Xeon processors, particularly the newer generation devices.

In this thread, we had tried a lower grade Xeon processor only to find it does not compare with a similar machine that is a core i7 CPU. So, we are now testing a different machine which has a higher end, Xeon Scalable Silver 4th Gen CPU. These CPUs are built to have deep learning boost features, using OpenVino. So, we expect (when comparing the specs) that it will perform the same as the core i7 or actually, a little better.

However, when we got the machine, we were disappointed to find it does not perform well at all.

The major thing we noticed:

1. On our Core i7 machines, if we turn on/off the "openvino_enabled" flag on benchmark, we can see a difference in the fps which means that OpenVino is working.

2. On the new Xeon Scalable, whether we turn OpenVino on or off in the benchmark, it makes no difference in the fps at all. This shows us that OpenVino is not working.

3. Upon digging in to it more, we see that you provide the OpenVino dll with the SDK (ultimatePluginOpenVINO.dll) so we don't need to have OpenVino installed on the machine. We see this in the benchmark console output as well.

4. However, it seems that the version included in the SDK is very old, this is the output in benchmark: 2020.3.0-3467-15f2c61a-releases/2020/3, version: 2.1

5. This is a problem because according to the system requirements for Intel OpenVino, the newer generation CPUs which include our Xeon Scalable Gen 4, are not listed as supported by this old version. This most likely explains why on the core i7 that we use, it seems that OpenVino makes a difference and we get good performance but on the new Xeon Gen 4, it is poor performance and no difference on OpenVino.

6. We were hoping that maybe you use the packaged one by default and if we explicitly installed a newer OpenVino, the SDK would pick that up and use it but this doesn't seem to be the case. We followed the OpenVino runtime installation as well as added the PATH variables to the system so it should have been found if you were doing that.

Any thoughts on this and if our assumption is correct, can we request that the OpenVino version in the SDK be updated to take advantage of newer Intel CPUs please?

Thanks!

Mike

Mike Bedford

VP Technology

EnSight Technologies

M: +1.951.490.5984

E: mi...@ensight-technologies.com

www.EnSight-Technologies.com

Mike Bedford

unread,

Apr 18, 2024, 9:54:29 PMApr 18

to Mamadou DIOP, doubango-ai

Hello,

We need to hear back on this issue. We are waiting to order 26 servers, which will result in 26 licenses. I cannot validate these servers with this open question.

Thanks!

Mike

Mamadou DIOP

unread,

Apr 18, 2024, 11:43:49 PMApr 18

to Mike Bedford, doubango-ai

Hello,

I need logs to assess potential issue in OpenVINO.

Set "rate=0" and take logs with openvino "on" and "off" on a machine you suspect there is an issue

Mike Bedford

unread,

Apr 19, 2024, 11:13:47 AMApr 19

to Mamadou DIOP, doubango-ai

Thank you for the time. As you requested, here are the logs. First log is with OpenVino off, 24fps. Second is with it on, 28fps. As you will note, there are also OpenVino load errors in the second log. I did just download the SDK recently and am running the benchmark in the SDK folder that I just downloaded. I don't see that OpenVino error on any of our working machines, that use the Core i7 CPU.

OpenVino off:

benchmark.exe --positive ../../../assets/images/lic_us_1280x720.jpg --negative ../../../assets/images/london_traffic.jpg --assets ../../../assets --ienv_enabled false --openvino_enabled false --gpgpu_enabled true --openvino_device CPU --klass_lpci_enabled false --klass_vcr_enabled false --klass_vmmr_enabled false --klass_vbsr_enabled false --charset latin --loops 100 --rate 0.0 --parallel true
2024-04-19 08:09:58.539119: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-04-19 08:09:58.540083: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

*[ULTALPR_SDK INFO]: Starting benchmark...
*[COMPV INFO]: [UltAlprSdkEngine]Call: ultimateAlprSdk::UltAlprSdkEngine::init

*[COMPV INFO]: [UltAlprSdkEngine]jsonConfig: {"debug_level": "info","debug_write_input_image_enabled": false,"debug_internal_data_path": ".","gpgpu_enabled": true,"max_latency": -1,"klass_vcr_gamma": 1.5,"detect_roi": [0, 0, 0, 0],"detect_minscore": 0.1,"pyramidal_search_enabled": false,"pyramidal_search_sensitivity": 0.28,"pyramidal_search_minscore": 0.8,"pyramidal_search_min_image_size_inpixels": 800,"recogn_minscore": 0.3,"recogn_score_type": "min","assets_folder": "../../../assets","charset": "latin","num_threads": -1,"recogn_rectify_enabled": false,"ienv_enabled": false,"openvino_enabled": false,"openvino_device": "CPU","npu_enabled": true,"asm_enabled": true,"intrin_enabled": true,"klass_lpci_enabled": false,"klass_vcr_enabled": false,"klass_vmmr_enabled": false,"klass_vbsr_enabled": false}
*[COMPV INFO]: [UltAlprSdkEngine]**** Copyright (C) 2011-2023 Doubango Telecom <https://www.doubango.org> ****
ultimateALPR-SDK <https://github.com/DoubangoTelecom/ultimateALPR-SDK> version 3.11.1

*[COMPV INFO]: [CompVBase] Initializing [base] modules (v 1.0.0, nt -1)...
*[COMPV INFO]: [CompVBase] sizeof(compv_scalar_t)= #8
*[COMPV INFO]: [CompVBase] sizeof(float)= #4
*[COMPV INFO]: [CompVBase] Windows dwMajorVersion=6, dwMinorVersion=2
*[COMPV INFO]: Initializing window registery
*[COMPV INFO]: [ImageDecoder] Initializing image decoder...
*[COMPV INFO]: [CompVCpu] H: 'GenuineIntel', S: '', M: '', MN: ''

*[COMPV INFO]: [CompVBase] CPU features: (intel);[x86];[x64];mmx;sse;sse2;sse3;ssse3;sse41;sse42;avx;avx2;fma3;erms;bmi1;bmi2;popcnt;cmov;aes;rdrand;avx512_f;avx512_cd;avx512_vl;avx512_bw;avx512_dq;avx512_ifma;avx512_vbmi;
*[COMPV INFO]: [CompVBase] CPU cores: online=#20, conf=#20

*[COMPV INFO]: [CompVBase] CPU cache1: line size: #64B, size :#48KB

*[COMPV INFO]: [CompVBase] CPU Phys RAM size: #32468GB

*[COMPV INFO]: [CompVBase] CPU endianness: LITTLE
*[COMPV INFO]: [CompVBase] Binary type: X86_64
*[COMPV INFO]: [CompVBase] Intrinsic enabled
*[COMPV INFO]: [CompVBase] Assembler enabled
*[COMPV INFO]: [CompVBase] OS name: Windows
*[COMPV INFO]: [CompVBase] Math Fast Trig.: true
*[COMPV INFO]: [CompVBase] Math Fixed Point: true
*[COMPV INFO]: [CompVMathExp] Init
*[COMPV INFO]: [CompVBase] Default alignment: #64
*[COMPV INFO]: [CompVBase] Best alignment: #64

*[COMPV INFO]: [CompVBase] Heap limit: #1662363KB (#1623MB)

*[COMPV INFO]: [CompVParallel] Initializing [parallel] module...
*[COMPV INFO]: /!\ Code in file 'compv_mem.cxx' in function 'compv::CompVMemZero_C' starting at line #508: Not optimized -> No SIMD implementation found

*[COMPV INFO]: [CompVThreadDispatcher] Thread dispatcher created with #20 threads/#20 cores

*[COMPV INFO]: [CompVParallel] [Parallel] module initialized
*[COMPV INFO]: [CompVBase] [Base] modules initialized
*[COMPV INFO]: [CompVCore] Initializing [core] module (v 1.0.0)...

*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 1 and name = 'FAST (Features from Accelerated Segment Test)'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 8 and name = 'ORB (Oriented FAST and Rotated BRIEF)'...

*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=7,set=useless, threadId:000000000000192C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=0,set=useless, threadId:0000000000001BE0, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=9,set=useless, threadId:0000000000002438, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 27 and name = 'Sobel edge detector'...

*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 28 and name = 'Scharr edge detector'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 29 and name = 'Prewitt edge detector'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 20 and name = 'Canny edge detector'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 30 and name = 'Hough standard (STD)'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 31 and name = 'Kernel-based Hough transform (KHT)'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 41 and name = 'Standard Histogram of oriented gradients (S-HOG)'...

*[COMPV INFO]: [CompVMatcher] Registering matcher factory with id = 0 and name = 'Brute force matcher'...

*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=8,set=useless, threadId:0000000000001390, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=1,set=useless, threadId:0000000000001E4C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=11,set=useless, threadId:0000000000002418, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=2,set=useless, threadId:0000000000001634, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=16,set=useless, threadId:00000000000003D8, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=6,set=useless, threadId:0000000000002550, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [CompVConnectedComponentLabeling] Registering connected component labeling factory with id = 1 and name = 'PLSL (Parallel Light Speed Labeling)'...

*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=14,set=useless, threadId:000000000000344C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=17,set=useless, threadId:000000000000175C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=10,set=useless, threadId:0000000000002B50, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=18,set=useless, threadId:0000000000000BA8, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=5,set=useless, threadId:00000000000018E0, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=19,set=useless, threadId:00000000000006C0, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [CompVConnectedComponentLabeling] Registering connected component labeling factory with id = 19 and name = 'LMSER (Linear Time Maximally Stable Extremal Regions)'...
*[COMPV INFO]: [CompVGL] Initializing [gl] module (v 1.0.0)...
*[COMPV INFO]: [CompVGL] GL module initialized
*[COMPV INFO]: [CompVGpu] Initializing [gpu] module (v 1.0.0)...
*[COMPV INFO]: [CompVCamera] Initializing [camera] module (v 1.0.0)...

*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=4,set=useless, threadId:0000000000003134, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=3,set=useless, threadId:0000000000001C98, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=13,set=useless, threadId:0000000000001874, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=12,set=useless, threadId:0000000000000E1C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=15,set=useless, threadId:000000000000265C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVCamera] Camera plugin path: C:\Users\ENSIGHT\Downloads\ultimateALPR-SDK-master\ultimateALPR-SDK-master\binaries\windows\x86_64\CompVPluginMFoundation.dll

*[COMPV INFO]: [CompVDrawing] Initializing [drawing] module (v 1.0.0)...
*[COMPV INFO]: [CompVDrawing] /!\ No jpeg decoder found
*[COMPV INFO]: [CompVDrawing] Drawing module initialized
*[COMPV INFO]: [CompVGpu] GPU enabled: true

*[COMPV INFO]: /!\ Code in file 'source\ultimate_base_engine.cxx' in function 'ultimateBase::UltBaseEngine::init' starting at line #75: Not optimized for GPU -> GPGPU computing not enabled or deactivated

*[COMPV INFO]: [UltBaseOpenCL] Trying to load [OpenCL.dll]
*[COMPV INFO]: [CompVSharedLib] Loading sharded library from OpenCL.dll
*[COMPV INFO]: [UltBaseOpenCL] Loaded [OpenCL.dll], looksLikeValid: yes...

*[COMPV INFO]: [UltBaseOpenCLUtils] Selected platform vendor: NVIDIA Corporation
*[COMPV INFO]: [UltBaseOpenCLUtils] deviceCount=1
*[COMPV INFO]: [UltBaseOpenCLUtils] Device -> name: NVIDIA RTX A4000, id: 000001FD8101EC20
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT=1
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE=1
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_COMPUTE_UNITS=48
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS=3
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_WORK_ITEM_SIZES=1024, 1024, 64,
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_WORK_GROUP_SIZE=1024
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_CLOCK_FREQUENCY=1560 MHz
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE=128 B
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_GLOBAL_MEM_SIZE=17170956288 B (16375 MB)
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_LOCAL_MEM_SIZE=49152 B (48 KB)
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_MEM_ALLOC_SIZE=4093 MB
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_PLATFORM_VERSION=OpenCL 3.0 CUDA 12.4.125
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_VERSION=OpenCL 3.0 CUDA
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DRIVER_VERSION=551.86
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_OPENCL_C_VERSION=OpenCL C 1.2
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_EXTENSIONS=cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_kernel_attribute cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32
*[COMPV INFO]: [UltBaseOpenCL] !!!Booom!!!, OpenCL successfully loaded [OpenCL.dll]

*[COMPV INFO]: [UltOcrEngine] Tensorflow version: 2.6.0

You're using an unlicensed version of ultimateALPR-SDK <https://github.com/DoubangoTelecom/ultimateALPR-SDK>
without the rights to include the SDK in any form of commercial product.

*[COMPV INFO]: [UltAlprSdkEnginePrivate]IC took 224 millis

*[COMPV INFO]: [CompVCpu] Enabling asm code
*[COMPV INFO]: [CompVCpu] Enabling intrinsic code

*[COMPV INFO]: [UltAlprSdkEnginePrivate]recogn_tf_num_threads: 20, acceleration backend: null
*[COMPV INFO]: [CompVThreadDispatcher] Not optimized -> Your system have #20 cores but you're only using #8. Sad!!
*[COMPV INFO]: [CompVThreadDispatcher] Thread dispatcher created with #8 threads/#20 cores
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=0,set=useless, threadId:00000000000017C4, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=2,set=useless, threadId:0000000000001578, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=1,set=useless, threadId:00000000000031B4, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=3,set=useless, threadId:00000000000020E4, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=4,set=useless, threadId:0000000000000674, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=6,set=useless, threadId:0000000000001B84, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=7,set=useless, threadId:0000000000001720, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=5,set=useless, threadId:0000000000003664, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%

2024-04-19 08:09:59.108020: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-19 08:09:59.140465: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-04-19 08:09:59.141742: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2024-04-19 08:09:59.143607: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2024-04-19 08:09:59.145317: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2024-04-19 08:09:59.147338: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2024-04-19 08:09:59.149594: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found
2024-04-19 08:09:59.153053: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2024-04-19 08:09:59.153570: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2024-04-19 08:09:59.154415: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%

2024-04-19 08:09:59.545454: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

*[COMPV INFO]: [UltAlprSdkEngine]Call: ultimateAlprSdk::UltAlprSdkEngine::warmUp

*[COMPV INFO]: /!\ Code in file 'source\ultimate_alpr_detector.cxx' in function 'ultimateAlpr::UltAlprDetector::process' starting at line #29: Not optimized -> Batching will not be activated for this function
*[COMPV INFO]: /!\ Code in file 'source\ultimate_ocr_tensorflow_session_detect.cxx' in function 'ultimateOcr::UltOcrTensorflowSessionDetector::processInternal' starting at line #116: Not optimized -> Batching not supported for this function

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #3 ***

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #6 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #2 ***

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #5 ***

*[COMPV INFO]: /!\ Code in file 'compv_mem.cxx' in function 'compv::CompVMemCopy_C' starting at line #985: Not optimized -> No SIMD implementation found. On ARM consider http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #1 ***

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #0 ***

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #4 ***

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel delivery ***
*[ULTALPR_SDK INFO]: Elapsed time (ALPR) = [[[ 4085.098300 millis ]]]
*[ULTALPR_SDK INFO]: result: {"duration":29,"frame_id":99,"latency":0}
*[ULTALPR_SDK INFO]: *** elapsedTimeInMillis: 4085.098300, estimatedFps: 24.479215 ***
*[ULTALPR_SDK INFO]: Press any key to terminate !!

OpenVino on:

benchmark.exe --positive ../../../assets/images/lic_us_1280x720.jpg --negative ../../../assets/images/london_traffic.jpg --assets ../../../assets --ienv_enabled false --openvino_enabled true --gpgpu_enabled true --openvino_device CPU --klass_lpci_enabled false --klass_vcr_enabled false --klass_vmmr_enabled false --klass_vbsr_enabled false --charset latin --loops 100 --rate 0.0 --parallel true
2024-04-19 08:04:38.482442: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-04-19 08:04:38.483268: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

*[ULTALPR_SDK INFO]: Starting benchmark...
*[COMPV INFO]: [UltAlprSdkEngine]Call: ultimateAlprSdk::UltAlprSdkEngine::init

*[COMPV INFO]: [UltAlprSdkEngine]jsonConfig: {"debug_level": "info","debug_write_input_image_enabled": false,"debug_internal_data_path": ".","gpgpu_enabled": true,"max_latency": -1,"klass_vcr_gamma": 1.5,"detect_roi": [0, 0, 0, 0],"detect_minscore": 0.1,"pyramidal_search_enabled": false,"pyramidal_search_sensitivity": 0.28,"pyramidal_search_minscore": 0.8,"pyramidal_search_min_image_size_inpixels": 800,"recogn_minscore": 0.3,"recogn_score_type": "min","assets_folder": "../../../assets","charset": "latin","num_threads": -1,"recogn_rectify_enabled": false,"ienv_enabled": false,"openvino_enabled": true,"openvino_device": "CPU","npu_enabled": true,"asm_enabled": true,"intrin_enabled": true,"klass_lpci_enabled": false,"klass_vcr_enabled": false,"klass_vmmr_enabled": false,"klass_vbsr_enabled": false}
*[COMPV INFO]: [UltAlprSdkEngine]**** Copyright (C) 2011-2023 Doubango Telecom <https://www.doubango.org> ****
ultimateALPR-SDK <https://github.com/DoubangoTelecom/ultimateALPR-SDK> version 3.11.1

*[COMPV INFO]: [CompVBase] CPU cache1: line size: #64B, size :#48KB

*[COMPV INFO]: [CompVBase] CPU Phys RAM size: #32468GB

*[COMPV INFO]: [CompVBase] Heap limit: #1662363KB (#1623MB)

*[COMPV INFO]: [CompVThreadDispatcher] Thread dispatcher created with #20 threads/#20 cores

*[COMPV INFO]: [CompVParallel] [Parallel] module initialized
*[COMPV INFO]: [CompVBase] [Base] modules initialized
*[COMPV INFO]: [CompVCore] Initializing [core] module (v 1.0.0)...

*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 1 and name = 'FAST (Features from Accelerated Segment Test)'...

*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=1,set=useless, threadId:0000000000001E60, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=6,set=useless, threadId:0000000000001880, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=2,set=useless, threadId:0000000000000D88, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 8 and name = 'ORB (Oriented FAST and Rotated BRIEF)'...

*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 27 and name = 'Sobel edge detector'...

*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=7,set=useless, threadId:0000000000001724, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=10,set=useless, threadId:0000000000000C04, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=0,set=useless, threadId:0000000000000F54, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=12,set=useless, threadId:00000000000008C4, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=4,set=useless, threadId:0000000000003514, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=13,set=useless, threadId:0000000000001408, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 31 and name = 'Kernel-based Hough transform (KHT)'...
*[COMPV INFO]: [CompVFeature] Registering feature factory with id = 41 and name = 'Standard Histogram of oriented gradients (S-HOG)'...

*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=17,set=useless, threadId:0000000000000830, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=11,set=useless, threadId:0000000000001554, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=5,set=useless, threadId:00000000000031A8, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=16,set=useless, threadId:0000000000000AEC, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=8,set=useless, threadId:00000000000031E0, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=9,set=useless, threadId:0000000000002128, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=14,set=useless, threadId:00000000000027F8, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [CompVMatcher] Registering matcher factory with id = 0 and name = 'Brute force matcher'...
*[COMPV INFO]: [CompVConnectedComponentLabeling] Registering connected component labeling factory with id = 1 and name = 'PLSL (Parallel Light Speed Labeling)'...
*[COMPV INFO]: [CompVConnectedComponentLabeling] Registering connected component labeling factory with id = 19 and name = 'LMSER (Linear Time Maximally Stable Extremal Regions)'...
*[COMPV INFO]: [CompVGL] Initializing [gl] module (v 1.0.0)...
*[COMPV INFO]: [CompVGL] GL module initialized
*[COMPV INFO]: [CompVGpu] Initializing [gpu] module (v 1.0.0)...

*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=18,set=useless, threadId:0000000000002FCC, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=15,set=useless, threadId:0000000000000740, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=19,set=useless, threadId:0000000000002E58, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=3,set=useless, threadId:0000000000001E70, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [CompVCamera] Initializing [camera] module (v 1.0.0)...

*[COMPV INFO]: [CompVCamera] Camera plugin path: C:\Users\ENSIGHT\Downloads\ultimateALPR-SDK-master\ultimateALPR-SDK-master\binaries\windows\x86_64\CompVPluginMFoundation.dll

*[COMPV INFO]: [UltBaseOpenCLUtils] Selected platform vendor: NVIDIA Corporation
*[COMPV INFO]: [UltBaseOpenCLUtils] deviceCount=1
*[COMPV INFO]: [UltBaseOpenCLUtils] Device -> name: NVIDIA RTX A4000, id: 00000199242FDF00
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT=1
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE=1
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_COMPUTE_UNITS=48
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS=3
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_WORK_ITEM_SIZES=1024, 1024, 64,
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_WORK_GROUP_SIZE=1024
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_CLOCK_FREQUENCY=1560 MHz
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE=128 B
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_GLOBAL_MEM_SIZE=17170956288 B (16375 MB)
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_LOCAL_MEM_SIZE=49152 B (48 KB)
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_MAX_MEM_ALLOC_SIZE=4093 MB
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_PLATFORM_VERSION=OpenCL 3.0 CUDA 12.4.125
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_VERSION=OpenCL 3.0 CUDA
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DRIVER_VERSION=551.86
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_OPENCL_C_VERSION=OpenCL C 1.2
*[COMPV INFO]: [UltBaseOpenCLUtils] CL_DEVICE_EXTENSIONS=cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_kernel_attribute cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32
*[COMPV INFO]: [UltBaseOpenCL] !!!Booom!!!, OpenCL successfully loaded [OpenCL.dll]

*[COMPV INFO]: [UltOcrEngine] Tensorflow version: 2.6.0

*[COMPV INFO]: [UltAlprPlugin] Loading plugin: C:\Users\ENSIGHT\Downloads\ultimateALPR-SDK-master\ultimateALPR-SDK-master\binaries\windows\x86_64\ultimatePluginOpenVino.dll ...
*[COMPV INFO]: [CompVSharedLib] Loading sharded library from C:\Users\ENSIGHT\Downloads\ultimateALPR-SDK-master\ultimateALPR-SDK-master\binaries\windows\x86_64\ultimatePluginOpenVino.dll
***[COMPV ERROR]: function: "compv::CompVSharedLib::open()"
file: "compv_sharedlib.cxx"
line: "69"
message: [CompVSharedLib] Failed to load library with path=C:\Users\ENSIGHT\Downloads\ultimateALPR-SDK-master\ultimateALPR-SDK-master\binaries\windows\x86_64\ultimatePluginOpenVino.dll, Error: 0x0000007e
***[COMPV ERROR]: function: "compv::CompVSharedLib::open()"
file: "compv_sharedlib.cxx"
line: "73"
message: Operation Failed (COMPV_ERROR_CODE_E_NOT_FOUND) ->
***[COMPV ERROR]: function: "compv::CompVSharedLib::newObj()"
file: "compv_sharedlib.cxx"
line: "119"
message: Operation Failed (COMPV_ERROR_CODE_E_NOT_FOUND) ->
***[COMPV ERROR]: function: "ultimateAlpr::UltAlprPlugin::init()"
file: "source\ultimate_alpr_plugin.cxx"
line: "96"
message: Operation Failed (COMPV_ERROR_CODE_E_NOT_FOUND) ->
*[COMPV INFO]: [UltAlprSdkEnginePrivate]**** Copyright (C) 2011-2023 Doubango Telecom <https://www.doubango.org> ****

You're using an unlicensed version of ultimateALPR-SDK <https://github.com/DoubangoTelecom/ultimateALPR-SDK>
without the rights to include the SDK in any form of commercial product.

*[COMPV INFO]: [UltAlprSdkEnginePrivate]IC took 225 millis

*[COMPV INFO]: [CompVCpu] Enabling asm code
*[COMPV INFO]: [CompVCpu] Enabling intrinsic code

*[COMPV INFO]: [UltAlprSdkEnginePrivate]recogn_tf_num_threads: 20, acceleration backend: OpenVino
*[COMPV INFO]: [CompVThreadDispatcher] Not optimized -> Your system have #20 cores but you're only using #8. Sad!!
*[COMPV INFO]: [CompVThreadDispatcher] Thread dispatcher created with #8 threads/#20 cores
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=1,set=useless, threadId:0000000000002D70, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=0,set=useless, threadId:0000000000003578, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=2,set=useless, threadId:0000000000002EB4, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=5,set=useless, threadId:0000000000001904, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=4,set=useless, threadId:0000000000000D44, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=6,set=useless, threadId:000000000000366C, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=7,set=useless, threadId:0000000000002244, kThreadSetAffinity:false) - ENTER
*[COMPV INFO]: [CompVAsyncTask11] compv::CompVAsyncTask11::run(coreId:requested=3,set=useless, threadId:00000000000011C0, kThreadSetAffinity:false) - ENTER

*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%

2024-04-19 08:04:39.108895: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-19 08:04:39.146708: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-04-19 08:04:39.148400: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2024-04-19 08:04:39.149579: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2024-04-19 08:04:39.155703: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2024-04-19 08:04:39.156856: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2024-04-19 08:04:39.158412: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found
2024-04-19 08:04:39.159694: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2024-04-19 08:04:39.160883: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2024-04-19 08:04:39.161673: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%

2024-04-19 08:04:39.612397: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #0 ***

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #3 ***

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #4 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #5 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel delivery ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #6 ***
*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #1 ***

*[COMPV INFO]: [UltAlprSdkEngine]Call: ultimateAlprSdk::UltAlprSdkEngine::warmUp

*[COMPV INFO]: [UltAlprSdkEnginePrivate]*** Entering parallel process for job #2 ***
*[COMPV INFO]: /!\ Code in file 'source\ultimate_alpr_detector.cxx' in function 'ultimateAlpr::UltAlprDetector::process' starting at line #29: Not optimized -> Batching will not be activated for this function
*[COMPV INFO]: /!\ Code in file 'source\ultimate_ocr_tensorflow_session_detect.cxx' in function 'ultimateOcr::UltOcrTensorflowSessionDetector::processInternal' starting at line #116: Not optimized -> Batching not supported for this function

*[ULTALPR_SDK INFO]: Elapsed time (ALPR) = [[[ 3460.484000 millis ]]]
*[ULTALPR_SDK INFO]: result: {"duration":29,"frame_id":99,"latency":0}
*[ULTALPR_SDK INFO]: *** elapsedTimeInMillis: 3460.484000, estimatedFps: 28.897692 ***
*[ULTALPR_SDK INFO]: Press any key to terminate !!

Mamadou DIOP

unread,

Apr 19, 2024, 1:24:21 PMApr 19

to Mike Bedford, doubango-ai

1/ when you enabled OpenVINO it failed to load the plugin which means openVINO isn’t used. This is confirmed by the fact that no OpenVINO version is printed. The issue is probably some dependencies are missing. Check https://github.com/DoubangoTelecom/ultimateALPR-SDK/tree/master/samples/c%2B%2B/benchmark#dependencies

2/ is GPU enabled on this PC (looks like no). We need logs with GPU not used because what we want is compare openVINO on CPU against Tensorflow on CPU (rather than GPU). The low fps suggests it’s running on CPU, if that’s the case, then it’s OK.

Reply all

Reply to author

Forward

0 new messages