(Py)Tesseract does not detect all expected hex codes in my image despite multiple preprocessing strategies

43 views
Skip to first unread message

JoeBlack

unread,
Jun 29, 2025, 11:33:04 PMJun 29
to tesseract-ocr
Source Code: https://pastebin.com/7XvxnZa6

I'm trying to OCR a screenshot of a "breach protocol" minigame from Cyberpunk 2077. In the game, there's a grid of two-character hex codes (e.g., 1C, 55, etc.). My goal is to detect all expected hex codes from a known list. I'm using multiple preprocessing methods (thresholding, adaptive thresholding, sharpening, etc.) and morphological closing to improve recognition. Despite this, some codes in the main matrix are still missed completely.

Below is my code that runs several preprocessing passes, combines unique tokens across passes, and draws rectangles around recognized hex codes. Any suggestions for improving detection accuracy, especially for codes missed in the main matrix, would be greatly appreciated!
breach_protocol_screenshot.png
Unbenannt.png

Lorenzo Bolzani

unread,
Jun 30, 2025, 5:20:08 AMJun 30
to tesser...@googlegroups.com
Hi Joe,
add this to your code:

    gray = preprocess(gray_base.copy())
    cv2.imshow(f"Detected unique hex codes {idx}", gray)
    cv2.waitKey(0)

you'll see the problem: two preprocessing methods generate junk, the other two are almost identical and the problem is a too high threshold: try 85 (the font on the right is lighter). Always visualize the image after EACH preprocessing sub step to see where it starts to go bad.

For a problem like this with fixed patterns/scale/colors, etc. I would use cv2.matchTemplate:

it should be 100% accurate.


You can also "blend" each code into a blob with dilate/erode, run a findComponents, crop out each region, run matchTemplate on each fragment. But I think the grid location is fixed so I would just use two nested loops to crop the codes at exact locations and template match.
If you have a smartphone picture as input you may want to use the first method so you can do a cv2.warpPerspective to align/rescale the grid to a fixed  size/location before proceeding.

But matchTemplate should handle multiple matches fine so there is no need to complicate things.

Please let me know how it ends.


Bye

Lorenzo

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/070afc51-4c9f-4692-b05a-11b06a13964dn%40googlegroups.com.

JoeBlack

unread,
Jun 30, 2025, 1:06:51 PMJun 30
to tesser...@googlegroups.com
https://pastebin.com/4J40QcJE

Template matching works better than Tesseract, and the code is much shorter as well.
I just had to split the screenshot into two areas - one for the matrix and one for the sequence - because the text colors are different.

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/BeutaThFaGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLy1rpn8NDnPSYh8HN9hbgYzoUbkU-4WOcVH1EA%2BhXVbKA%40mail.gmail.com.
success.png
Reply all
Reply to author
Forward
0 new messages