Not able to extract English text from black text on a white background

128 views
Skip to first unread message

John Alway

unread,
Jun 20, 2022, 10:18:19 AM6/20/22
to tesseract-ocr
Hello,

I've been working with Tessearct 4.1.1 (in C#, Visual Studio).   I've been taking screenshots of small regions of my screen to capture text from youtube comments, TikTok chats, etc., and it has done a great job of converting the print in the images to text.   I thought it did a great job on this text.

However, it failed when I took a screenshot of text in my notepad application, which is the simplest text imaginable. A list of words.  It's black text on a white background.  It only extracted a word or two from the document.    The image snapshot is just 190 by 260 pixels.   It says it's 300x300 dpi.    

Here is the text from the document:


Up
Hey
Down
What
Left
Left
So
That's
It
Start
and this
what
Down

Here is the screenshot image I had Tesseract extract from.

screenshot4.jpg

Is there a way to fix this problem?

Many thanks for any help.

Regards,
...John

Zdenko Podobny

unread,
Jun 20, 2022, 2:53:59 PM6/20/22
to tesser...@googlegroups.com

po 20. 6. 2022 o 16:18 John Alway <jal...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/af742982-2b5b-43a7-adb6-3c25ecfe610cn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages