Tesseract Returns Inconsistent Results

144 views
Skip to first unread message

Rodger Berry Jr.

unread,
Oct 27, 2024, 7:16:04 AM10/27/24
to tesseract-ocr
I am trying to pull the text from several images from a game.  I cannot access the game's memory, so screenshots are what I have to work with.  I've converted the images to grayscale, by I'm still not getting any text returned from this image.  I do from images that are extremely similar, but nothing from this one.

I'm not entirely certain what the best options are to increase the accuracy of what tesseract returns, but I will wind up having to process over 1 million images similar to this one.  I'm happy to do the research, but I'm unsure of the best rout to pursue given the issues I'm having with this image; knowing how many I'll need to process soon.1124_1175.png

Zdenko Podobny

unread,
Oct 27, 2024, 9:39:29 AM10/27/24
to tesser...@googlegroups.com
Hi,

please provide a color version of the image.



Zdenko


ne 27. 10. 2024 o 12:15 Rodger Berry Jr. <reber...@gmail.com> napísal(a):
I am trying to pull the text from several images from a game.  I cannot access the game's memory, so screenshots are what I have to work with.  I've converted the images to grayscale, by I'm still not getting any text returned from this image.  I do from images that are extremely similar, but nothing from this one.

I'm not entirely certain what the best options are to increase the accuracy of what tesseract returns, but I will wind up having to process over 1 million images similar to this one.  I'm happy to do the research, but I'm unsure of the best rout to pursue given the issues I'm having with this image; knowing how many I'll need to process soon.1124_1175.png

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/b4a15af8-9d32-4439-b0fd-3da6d36c0017n%40googlegroups.com.

Rodger Berry Jr.

unread,
Oct 27, 2024, 11:39:03 AM10/27/24
to tesseract-ocr
Thank you for those links.  I'll check them out now.  In the meantime, here is the color version of the image you requested.
Screenshot 2024-10-27 103831.png

Zdenko Podobny

unread,
Nov 2, 2024, 2:14:33 PM11/2/24
to tesser...@googlegroups.com
This is quite difficult for tesseract because of the complex background/image structure. You have to use other tools for text detection/document layout analysis and then you can try to OCR cropped images with tesseract.

For your case, the better solution will be to use e.g. Microsoft Omnipaser. Their demo.ipynb produces these results for your image:

game_screen_color_res.png
parsed_content_list = ['Text Box ID 0: 3',
 'Text Box ID 1: heFarmGuy',
 'Text Box ID 2: TheFarmGuy',
 'Text Box ID 3: catnap far',
 'Text Box ID 4: Gladiator fart',
 'Text Box ID 5: Gladiator far3',
 'Text Box ID 6: TheFarmGuy',
 'Text Box ID 7: NEIT_Farm',
 'Text Box ID 8: ator far3',
 'Text Box ID 9: cat nap farm',
 'Text Box ID 10: NEIT_Farm06',
 'Text Box ID 11: Q.A farm1',
 'Text Box ID 12: arm07',
 'Text Box ID 13: (FHn)Alliance Center',
 'Icon Box ID 14: the number 3.']


ne 27. 10. 2024 o 16:39 Rodger Berry Jr. <reber...@gmail.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages