Ideas for image pre-processing

74 views
Skip to first unread message

Soul Green

unread,
Oct 15, 2021, 2:47:00 AM10/15/21
to tesseract-ocr
Hi all, I have an issue with tesseract (.js if that matters) erroneously detecting the wrong things in the image. In the following image, it picks up the artefact in the top-right quadrant and for some reason only outputs "LEVEL", with no digits.

 fail_gyarados.png
I realize that removing the artefacts is the best solution, but they can be unpredictable in position and shape.
Does anyone have any good ideas or resources you can point me towards to isolate and remove these artefacts?
They always start on an edge, so my intuition is that I could (somehow) remove any pixel adjacent to a pixel that is (recursively) adjacent to the edge. But not sure how to read and modify image data in such a way or if I should use an existing library to do so. Also not sure what search terms to employ to research such algorithms.
Reply all
Reply to author
Forward
0 new messages