Not able to extract English text from black text on a white background

128 views

Skip to first unread message

John Alway

unread,

Jun 20, 2022, 10:18:19 AM6/20/22

to tesseract-ocr

Hello,

I've been working with Tessearct 4.1.1 (in C#, Visual Studio). I've been taking screenshots of small regions of my screen to capture text from youtube comments, TikTok chats, etc., and it has done a great job of converting the print in the images to text. I thought it did a great job on this text.

However, it failed when I took a screenshot of text in my notepad application, which is the simplest text imaginable. A list of words. It's black text on a white background. It only extracted a word or two from the document. The image snapshot is just 190 by 260 pixels. It says it's 300x300 dpi.

Here is the text from the document:

Up
Hey
Down
What
Left
Left
So
That's
It
Start
and this
what
Down

Here is the screenshot image I had Tesseract extract from.

Is there a way to fix this problem?

Many thanks for any help.

Regards,

...John

Zdenko Podobny

unread,

Jun 20, 2022, 2:53:59 PM6/20/22

to tesser...@googlegroups.com

What about the reading docs?

https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md

Zdenko

po 20. 6. 2022 o 16:18 John Alway <jal...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/af742982-2b5b-43a7-adb6-3c25ecfe610cn%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages