Re: Discrepancy in KYC Data Extraction Output

18 views
Skip to first unread message
Message has been deleted

Mamadou DIOP

unread,
May 13, 2025, 10:19:52 PMMay 13
to Ravi Kakadiya, douba...@googlegroups.com
Hi,

Your image is cut, about 1/6 is missing. In the past you posted the same kind of “issue” (image cut) at: 
- …

So, I w’ll take time to explain why you must not cut the image hoping that you’ll not keep asking the same question again and again.

As explained at https://www.doubango.org/SDKs/kyc-documents-verif/docs/Graph_computation.html#nodes-generation, we use graph computation on nodes generated from the image you provide. An image may contain millions of nodes and it’d take several minutes to process all of them. To speed-up the process we use something called random sample consensus (https://en.wikipedia.org/wiki/Random_sample_consensus) in several parts of the pipeline. One of the RANSAC algorithm used is MAGSAC (https://www.doubango.org/SDKs/kyc-documents-verif/docs/Graph_computation.html#marginalizing-sample-consensus-magsac). Other RANSAC variants are used by other modules (Persp, STN, GAS…)

The 2 words “random” and “sample” means a subset of the nodes is randomly selected to be processed. This subset is representative of the whole nodes. Let’s say you have 5 millions nodes for an image, the subset may be as small as 7 thousands nodes. “Random” doesn’t mean “blind”, there are well-known methods to select the subset to really be representative of the whole nodes.

When you cut the image it makes it extremely difficult to have a representative subset because a whole chunk is missing. The selected subset may lead to a degenerate solution when we try to compute the homography or the TPS model.

Because the solution is degenerate (think of it as being “on the edge of the cliff”) and the samples are “random”, you could fall of the cliff easily if the randomly selected subset is not good. Both the tomography and the TPS models are matrices. Being on the edge has a scientific name which is “ill-conditioned matrice”. Quote from https://en.wikipedia.org/wiki/Condition_number: "In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input, and how much error in the output results from an error in the input."

In your message you said that it works with our online demo by it doesn’t work on your local machine. That’s not true at all. You tried several times with the online demo with the same image, sometimes it worked and sometimes it didn’t and you selected the time it worked. It works sometimes and it doesn’t because you’re “on the edge of the cliff” (the homography is ill-conditioned). If you don’t cut the image, you’ll be far away from the cliff.

I don’t know which OS you’re using but the behaviour depends on it:
- windows: may have different result each time you restart the process
- linux: may have different result  each time you restart the thread

I have tried the same image with Regula (https://api.regulaforensics.com/) and it fails. Tried with Microsoft Azure Form Recognizer (https://parsio.io/id-documents/) and it works. In the coming versions will check how to improve the accuracy when the document is cut.

You can change the way the nodes are selected by using a different resampler, check https://www.doubango.org/SDKs/kyc-documents-verif/docs/Configuration_options.html#magsac-resampler. Note that the TPS module do not use MAGSAC which means you may have better result with another sampler for the registration but not always.

On 13 May 2025, at 11:38, Ravi Kakadiya <ravikak...@gmail.com> wrote:

Dear Team,

I hope this message finds you well.

I am currently working on extracting data from a document using the KYC feature. However, I’ve noticed a discrepancy between the output received from the website(https://www.doubango.org/webapps/kyc-documents-verif) you provided and the output from the application(verify).

Please find the attached screenshot for your reference.

Could you kindly assist in clarifying why there is a difference between the two outputs, and advise on how to proceed?

Looking forward to your support.

Best regards,
Ravi Kakadiya


--
You received this message because you are subscribed to the Google Groups "doubango-ai" group.
To unsubscribe from this group and stop receiving emails from it, send an email to doubango-ai...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/doubango-ai/CAOeYMz45LzQwTQGTrMfQnq4MvLvxTO8TKqHXNw%3DccatrzdBiBA%40mail.gmail.com.
<verify_output.txt><document.jpg><website_output.png>

Ravi Kakadiya

unread,
May 14, 2025, 12:00:47 AMMay 14
to Mamadou DIOP, douba...@googlegroups.com

Dear Team,

Thank you for your response and the detailed technical explanation.

I wanted to share that I’ve tried multiple times with the same image, and in my case, it consistently works—even when I compress the resolution. The results have been stable across different runs on my system.

I understand that the process involves randomness and may behave differently across environments, but based on my tests, the output has been reliable so far.

Please let me know if there’s any specific configuration or test case you’d like me to try on my end to help further investigate this.

Best regards,
Ravi Kakadiya

Message has been deleted

Mamadou DIOP

unread,
May 14, 2025, 12:37:01 AMMay 14
to Ravi Kakadiya, douba...@googlegroups.com

may be you have a point here:

- your logs show that the document is classified as being "Australia - Medicare Card #1 (Interim)" instead of "Germany - Id Card (2010-2021) Side B"

- when I try on my machine (Windows 8) using verify app, I got the same issue

- on the same machine, when I change --vino_activation "auto" to "--vino_activation "off" it works fine

- looks like an issue with OpenVINO. The server doesn't use OpenVINO, it uses a GPU (CUDA/TensorRT)

let me check why it works with Tensorflow and TensorRT/CUDA and doesn't work with OpenVINO

On 5/14/2025 6:11 AM, 'Mamadou DIOP' via doubango-ai wrote:

attached 2 results using your image: they're different. Tried right now on using the online demo. Your claim that "it consistently works" or "reliable so far" is not correct.

there is no "specific configuration" to try. I spent time explaining in details so that you understand you must not cut the image and if you do so it's at your own risk.

Ravi Kakadiya

unread,
May 14, 2025, 12:40:41 AMMay 14
to Mamadou DIOP, douba...@googlegroups.com
Thank you for your explanation. 
Message has been deleted

Ravi Kakadiya

unread,
May 14, 2025, 3:12:25 AMMay 14
to Mamadou DIOP, douba...@googlegroups.com
Thanks for your explanation. 

On Wed, May 14, 2025 at 12:02 PM Mamadou DIOP <diopm...@doubango.org> wrote:

I have checked the issue with OpenVINO and the problem is that it detects 2 cards. Check the attached image (blue and red boxes).

The OpenVINO model is generated from the Tensorflow one but is optimized. If you check your card you can understand why it could be confused as being 2.

I have done a quality check on OpenVINO running it against a dataset of 8,000 images and the accuracy is what we expect. This particular image produces different result with openVino but there is no issue in the model optimization.

We're working to replace Tensorflow and OpenVINO with ONNX-RT in all our projects.

For now I don't have a fix for this particular image. To get same result as the online demo you'll have to use a GPU or disable OpenVINO to fallback to Tensorflow on CPU. Check https://www.doubango.org/SDKs/kyc-documents-verif/docs/Configuration_options.html#openvino-activation

Reply all
Reply to author
Forward
0 new messages