Problem with detecting the very last line of the paragraph I want to detect in my image

39 views
Skip to first unread message

AdamTuby

unread,
Sep 17, 2021, 11:41:45 AM9/17/21
to tesseract-ocr
Screenshot from 2021-09-17 18-28-29.pngScreenshot from 2021-09-17 18-28-19.pngScreenshot from 2021-09-17 18-28-13.pngI'm trying to detect a paragraph in some image. I do that by preparing in advance the very first and very last expressions in the paragraph, in order to box all of the paragraph, tho I'm having issues with detecting the very last expressions in the paragraph. It's detecting the characters of the very last words, but for some reason it just returns completely different and tho coherent (consistently the same) characters that don't match what's actually written there.
I'm adding some png's for visual representation of the issue. Btw, excuse me in advance if the contents of the paragraph of the text have inappropriate contexts - I'm simply doing my testing on this specific png.

The png's:

AdamTuby

unread,
Sep 18, 2021, 5:59:16 AM9/18/21
to tesseract-ocr

Solved the issue by myself. Turns out it was a derivative of one of the very first solution I tried to applied: turning the image into a gray scale. Apparently all I needed was to instead convert it to a binary image rather than a grayscale bitmap. Everything's working much better now. Thank you all! 
Reply all
Reply to author
Forward
0 new messages