generic meme extraction?

63 views
Skip to first unread message

Glenn C

unread,
Feb 14, 2024, 2:28:30 PM2/14/24
to tesseract-ocr
Hi all,

I'm trying to build a meme text extraction.  Since I don't know the font, location, or other details of the text, I can't use any of the documented or internet recommendations on things like whitelists, or single line, etc.  In this example, the detection is too accurate...I want the meme text and not the other things in the images.

What's the best methods to filter or image process these types of issues?  (most internet recommendations are for noise filtering, color inversion, etc, which aren't really useful here)

I attach a sample image, and here's my tesseract output as well.

thanks in advance!

IMG_5592.jpg
imgtotext.png

Glenn Cochran

unread,
Feb 22, 2024, 12:16:29 PM2/22/24
to tesser...@googlegroups.com
Hi experts,

I’ve read that tesseract is not good at image OCR, for images like internet photos, but does well on pdf text. 

Is this true, or I need to build some complex training to guide it?

Sent from my iPhone

On Feb 14, 2024, at 12:28, Glenn C <gck...@gmail.com> wrote:

Hi all,
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5126e29c-b2af-43db-b570-d6d7af2e57acn%40googlegroups.com.
<IMG_5592.jpg>
<imgtotext.png>
Reply all
Reply to author
Forward
0 new messages