Tesseract newbie - No output from tesseract

4,875 views
Skip to first unread message

Tenzin

unread,
Dec 19, 2008, 4:25:12 AM12/19/08
to tesseract-ocr
Hi,

I am new to tesseract. To test and play around with tesseract, I
created an english document with a few sentences in my wordprocessor
(openoffice.org writer) and then using the printscreen command in
gnome, i created a screenshot of the document. I edited this
screenshot using gimp to just select the text part of the image and
saved it as uncompressed TIFF format (.tif). Using this file as my
input, i ran tesseract using the following command:

tesseract test.tif test -l eng

The program terminates by outputting the text "Tesseract Open Source
OCR Engine". It doesnt show any error. The output text file test.txt
is also created but it is just 1 byte in size and doesnt have any
content in it. I tried creating the test image file using different
types of fonts but the same thing happens. I would therefore like to
know if I am doing something wrong here.

Please help. Also, is there an option to make the tesseract
application more verbose or to make it log its details in any file?? I
am using tesseract 2.03 on Debian Lenny.

Regards
Tenzin

Sascha Schimke

unread,
Dec 19, 2008, 5:23:27 AM12/19/08
to tesser...@googlegroups.com
Hello,

I would think, that your screenshot just has a resolution, which is to
small. I use tesseract with scanned images of 300dpi. A screen
typically only has less than 100dpi. Try to export a pdf file from
open office, then convert this (using photoshop or gimp) to tif. By
this, you can select the amount of dpi, when opening the pdf file gimp
or photoshop.

Regards,
Sascha

74yrs old

unread,
Dec 19, 2008, 11:50:08 AM12/19/08
to tesser...@googlegroups.com
Yes. Resolutuion should be 300dpi
cheers

lab

unread,
Dec 19, 2008, 8:37:02 PM12/19/08
to tesseract-ocr
In my experience, TIFF files sometimes have an alpha layer. The
easiest way to ensure a usable image for tesseract is to do these two
steps (on Debian)

convert test.tif new_test.pbm
convert new_test.pbm new_test.tif

Then try to OCR the file new_test.tif. The convert program is part of
the imagemagick package.

Ray Smith

unread,
Dec 20, 2008, 2:02:34 PM12/20/08
to tesser...@googlegroups.com

ABB

unread,
Dec 23, 2008, 2:35:12 AM12/23/08
to tesseract-ocr
Link not found :-(

On Dec 21, 12:02 am, "Ray Smith" <theraysm...@gmail.com> wrote:
> Seehttp://code.google.com/p/tesseract-ocr/issues/detail?id=160Ray.
Message has been deleted

Ray Smith

unread,
Dec 23, 2008, 7:47:09 PM12/23/08
to tesser...@googlegroups.com

Tenzin Dendup

unread,
Dec 24, 2008, 2:21:39 AM12/24/08
to tesser...@googlegroups.com
Thanks for all the help. the OCR worked for me. Changing the dpi to 300 helped in getting some output but the recognition was very low. Using the convert program to convert the gimp-created tiff images to pbm and then again using convert to change the pbm files back to tiff and then running tesseract made it work very well (Recognition was almost 100%) .

Once again, thanks all for the help.

Regards

74yrs old

unread,
Dec 24, 2008, 7:42:26 AM12/24/08
to tesser...@googlegroups.com
Ray,
Seehttp://code.google.com/p/tesseract-ocr/issues/detail?id=160Ray. not found with error message  as follows:
"

Not Found

The requested URL /p/tesseract-ocr/issues/detail?id=160Ray was not found on this server.
On Wed, Dec 24, 2008 at 6:17 AM, Ray Smith <thera...@gmail.com> wrote:
http://code.google.com/p/tesseract-ocr/issues/detail?id=160

With regards,
-sriranga
Reply all
Reply to author
Forward
0 new messages