Re: Improve Current Tesseract Results

110 views
Skip to first unread message
Message has been deleted

Tom Morris

unread,
Jan 13, 2021, 10:24:15 AM1/13/21
to tesseract-ocr
I suspect your problem is more to do with the tabular format and the lines than the fact that it's Korean or the image quality. You might want to search the archive for other threads discussing handling tabular data and/or line removal. There's a Leptonica tutorial on line removal (http://www.leptonica.org/line-removal.html), but table OCR a little specialized.

Tom

On Wednesday, January 13, 2021 at 8:12:58 AM UTC-5 Glenn wrote:
Hello, I am currently working on this Korean dataset and was having some issues on getting the values all correctly. A few problems are the pictures being slightly wonky as well as it being in Korean.

ApplicationFrameHost_bxb8Ck9yTh.png

I cropped the data as well as made it greyscale to attempt to better the image, but it still looks slightly blurry. I'm not sure if this is the best way and can crop out to a larger image.

The current problem is that the performance is not very good. The default settings gives me a jumble. Although I found that psm 4 is the best, it still does not look very good and it seems like tesseract just breaks halfway through.
Code_I1PxTycm88.png
How can I improve this? I was thinking of cutting the data into slices to read each, but still I am not sure if I can fix this. Is the image quality just not good enough?

Thank you
Reply all
Reply to author
Forward
Message has been deleted
0 new messages