Tesseract command works on OSX but not Windows

75 views
Skip to first unread message

Jonathan Warrick

unread,
Nov 18, 2015, 3:13:39 AM11/18/15
to tesseract-ocr

I have written a script that utilizes Tesseract to extract simple text from a .tif file, which works perfectly as expected when using OSX, but does not seem to work at all when I try and run the command on a Windows machine (Windows 7; OS of machine it will eventually be run on unknown currently). The image is very straight-forward and has no problem being interpreted using the command tesseract tmp.tif out -psm 7 on OSX.


On Windows, I have installed Tesseract 3.02 and tried to run this same command; after prompting me for permission to run, the command executes and outputs a file, but it is completely empty (0 bytes). The other strange thing I've noticed is that there is no output that gets printed to STDOUTwhen running on Windows; this line always outputs on OSX: Tesseract Open Source OCR Engine v3.02.02 with Leptonica.


Any ideas what's going on? The file is exactly the same, so I don't see how it could be related to image quality. Any help is much appreciated!

tmp.tif

Marco Atzeri

unread,
Nov 18, 2015, 5:28:59 AM11/18/15
to tesser...@googlegroups.com
On 18/11/2015 05:32, Jonathan Warrick wrote:
> I have written a script that utilizes Tesseract to extract simple text
> from a |.tif| file, which works perfectly as expected when using OSX,
> but does not seem to work at all when I try and run the command on a
> Windows machine (Windows 7; OS of machine it will eventually be run on
> unknown currently). The image is very straight-forward and has no
> problem being interpreted using the command |tesseract tmp.tif out -psm
> 7| on OSX.
>
>
> On Windows, I have installed Tesseract 3.02 and tried to run this same
> command; after prompting me for permission to run, the command executes
> and outputs a file, but it is completely empty (0 bytes). The other
> strange thing I've noticed is that there is no output that gets printed
> to |STDOUT|when running on Windows; this line always outputs on OSX:
> |Tesseract Open Source OCR Engine v3.02.02 with Leptonica|.
>
>
> Any ideas what's going on? The file is exactly the same, so I don't see
> how it could be related to image quality. Any help is much appreciated!

No idea, may be some permission issue on the directory.
However using the cygwin package from Cygwin Terminal it works fine:

$ tesseract tmp.tif out -psm 7
Tesseract Open Source OCR Engine v3.04.00 with Leptonica

$ cat out.txt
90

Regards
Marco

Jonathan Warrick

unread,
Nov 18, 2015, 8:28:55 AM11/18/15
to tesseract-ocr
I finally was able to take an image produced by my script and manipulate it in a way that Tesseract reads it, so it does actually appear to be an image quality issue when converting on OSX vs. PC. To do this, I used the SnagIt editor on my PC, adjusted the DPI to 300, and re-saved the file. Is there another way to do this from the script itself? This process needs to iterate over many images, so manual adjustment is not an option. This is a legacy script written in Ruby that I inherited, so I've been playing around with ImageMagick via the RMagick gem, but have not had any success.  I've uploaded a sample of the `.tif` file and the output that I used to compare against the original image to help as well.
new_tmp_working_version.txt
new_tmp.tif
Reply all
Reply to author
Forward
0 new messages