Hi,
Good to see you have fixed one issue.
1/ Tensorflow 2.18 requires CUDA 12.5 while you're using 12.8
2/ Using Tensorflow on Linux on a Windows 11 host via WSL and trying to configure GPU is a disaster in the making
3/ Your GPU is stalling. If you write a simple python code doing some benchmark you'll have same issue.
4/ My advise: don't use WSL
--
You received this message because you are subscribed to the Google Groups "doubango-ai" group.
To unsubscribe from this group and stop receiving emails from it, send an email to doubango-ai...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/doubango-ai/04ce3882-556d-4ebd-beb3-ac01977b06b3n%40googlegroups.com.
How did you get the "27.066 sec"?
You should use the benchmark app if not
how you got the duration
A GPU takes much more time to process the
first image, this is why there is a warming on all benchmark
apps you can find on internet. Check
https://github.com/DoubangoTelecom/KYC-Documents-Verif-SDK/blob/fb4bd2bf7fbfaa207560f77feb235d5e0c328f5a/samples/cpp/benchmark/benchmark.cxx#L210
To view this discussion visit https://groups.google.com/d/msgid/doubango-ai/097c4558-06ad-45cd-a5ac-90803886a1f4%40doubango.org.
Google deprecated Tensorflow support for
GPU on Windows (after TF2.10). So you'll not be able to run it
directly on Windows unless you use TF2.6+CUDA11. We only support
TF 2.6.0, 2.14.0, 2.16.1 and 2.18.0
We use Ubuntu, so I'd recommend that OS
To view this discussion visit https://groups.google.com/d/msgid/doubango-ai/d0a684fa-4f54-4704-96e9-126d791ac37fn%40googlegroups.com.
Using Workers will slow down the code. Check https://www.doubango.org/SDKs/kyc-documents-verif/docs/Architecture_overview.html#thread-safety
The process function is auto-locked which
means only 1 thread can run it at a time, all others will be
locked
The C++ code looks like this:
int process() {
COMPV_AUTOLOCK(mutex); // <- all your
workers will be locked here.
....
}
and we don't support parallel processing
(https://www.doubango.org/SDKs/kyc-documents-verif/docs/Parallel_processing.html)
with Python
To view this discussion visit https://groups.google.com/d/msgid/doubango-ai/fa8bbeb7-6528-478e-92d1-0939d1ef3a99n%40googlegroups.com.
RTX3060 can process 7 images per second. Your numbers show you're 8 to 9 times slower.
Benchmark numbers: https://github.com/DoubangoTelecom/KYC-Documents-Verif-SDK/tree/main/samples/cpp/benchmark#peformance-numbers
Collect logs for 3 scenarios: GPU only,
CPU only, both work-balancing. Use the Californian Driver
License or any other public image (an image you can share).
GPU only:
LD_LIBRARY_PATH=../../../binaries/linux/x86_64:$LD_LIBRARY_PATH
./benchmark \
--image "../../../assets/images/United States - California
Driving License (2017).jpg" \
--assets ../../../assets \
--loops 20 \
--vino_activation "off" \
--gpu_ctrl_mem false \
--parallel true
CPU only:
LD_LIBRARY_PATH=../../../binaries/linux/x86_64:$LD_LIBRARY_PATH
./benchmark \
--image "../../../assets/images/United States - California
Driving License (2017).jpg" \
--assets ../../../assets \
--loops 20 \
--vino_activation "on" \
--gpu_ctrl_mem false \
--parallel true
Both:
LD_LIBRARY_PATH=../../../binaries/linux/x86_64:$LD_LIBRARY_PATH
./benchmark \
--image "../../../assets/images/United States - California
Driving License (2017).jpg" \
--assets ../../../assets \
--loops 20 \
--vino_activation "auto" \
--gpu_ctrl_mem false \
--parallel true
To view this discussion visit https://groups.google.com/d/msgid/doubango-ai/53eb4162-b4ad-4508-92e8-36647f3af86en%40googlegroups.com.
The issue is that you've included the jpeg decoding in the timing (https://gist.github.com/luxzg/98077666ba85b8e5c71d84d56b98f405#file-benchmark-cxx-L174). Plus disk accesses to read the file for each loop.
The jpeg decoder used is at https://github.com/DoubangoTelecom/KYC-Documents-Verif-SDK/blob/main/samples/cpp/stb_image.h and not optimized at all.
The process function has 3 variants:
You're using the 1st version which requires raw/uncompress data. The 3rd version accepts compressed data and uses libjpeg-turbo to decode the image. Use the 3rd version in your benchmark app and check if it's faster.
It's common practice not to include image decoding in
benchmarking.
To view this discussion visit https://groups.google.com/d/msgid/doubango-ai/da74d971-0c99-4f26-b93a-ae3815627d69n%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/doubango-ai/D09A715E-1A9E-4C5E-A825-49AFEB43F936%40doubango.org.