Two almost similar images producing way different text outputs!

64 views
Skip to first unread message

Iod

unread,
Sep 1, 2015, 9:23:18 AM9/1/15
to tesseract-ocr
Hi Community,

I get a strange behaviour using tesseract-3.02.02 on an Ubuntu 12.04 system. The four attached input files differ only by a single line/column of pixels, appended to the top/bottom/left/right of the 'reference.png' image. That is, for example, 'top.png' was produced by appending a single row of pixels, on top of the 'reference.png' image. So, visually the 5 images look similar. However the output of tesseract seems to jump far away, in some cases (see the 'output.png' table below)
1. Am I doing something terribly wrong?
2. If anybody is able to reproduce this, do you know which piece of the pipeline is responsible for this discrepancy?
3. Is there a quick way around?
Thanks for helping.


top.png
bottom.png
left.png
right.png
reference.png

Iod

unread,
Sep 1, 2015, 12:03:11 PM9/1/15
to tesseract-ocr
Sorry, here's tesseract's output for three different settings of option '0psm'. See attached. Thanks
output.png
Reply all
Reply to author
Forward
0 new messages