Tenzin

unread,

Dec 19, 2008, 4:25:12 AM12/19/08

to tesseract-ocr

Hi,

I am new to tesseract. To test and play around with tesseract, I
created an english document with a few sentences in my wordprocessor
(openoffice.org writer) and then using the printscreen command in
gnome, i created a screenshot of the document. I edited this
screenshot using gimp to just select the text part of the image and
saved it as uncompressed TIFF format (.tif). Using this file as my
input, i ran tesseract using the following command:

tesseract test.tif test -l eng

The program terminates by outputting the text "Tesseract Open Source
OCR Engine". It doesnt show any error. The output text file test.txt
is also created but it is just 1 byte in size and doesnt have any
content in it. I tried creating the test image file using different
types of fonts but the same thing happens. I would therefore like to
know if I am doing something wrong here.

Please help. Also, is there an option to make the tesseract
application more verbose or to make it log its details in any file?? I
am using tesseract 2.03 on Debian Lenny.

Regards
Tenzin

Sascha Schimke

unread,

Dec 19, 2008, 5:23:27 AM12/19/08

to tesser...@googlegroups.com

Hello,

I would think, that your screenshot just has a resolution, which is to
small. I use tesseract with scanned images of 300dpi. A screen
typically only has less than 100dpi. Try to export a pdf file from
open office, then convert this (using photoshop or gimp) to tif. By
this, you can select the amount of dpi, when opening the pdf file gimp
or photoshop.

Regards,
Sascha

74yrs old

unread,

Dec 19, 2008, 11:50:08 AM12/19/08

to tesser...@googlegroups.com

Yes. Resolutuion should be 300dpi
cheers

lab

unread,

Dec 19, 2008, 8:37:02 PM12/19/08

to tesseract-ocr

In my experience, TIFF files sometimes have an alpha layer. The
easiest way to ensure a usable image for tesseract is to do these two
steps (on Debian)

convert test.tif new_test.pbm
convert new_test.pbm new_test.tif

Then try to OCR the file new_test.tif. The convert program is part of
the imagemagick package.

Ray Smith

unread,

Dec 20, 2008, 2:02:34 PM12/20/08

to tesser...@googlegroups.com

See http://code.google.com/p/tesseract-ocr/issues/detail?id=160

Ray.

ABB

unread,

Dec 23, 2008, 2:35:12 AM12/23/08

to tesseract-ocr

Link not found :-(

On Dec 21, 12:02 am, "Ray Smith" <theraysm...@gmail.com> wrote:
> Seehttp://code.google.com/p/tesseract-ocr/issues/detail?id=160Ray.

Message has been deleted

Ray Smith

unread,

Dec 23, 2008, 7:47:09 PM12/23/08

to tesser...@googlegroups.com

http://code.google.com/p/tesseract-ocr/issues/detail?id=160

Tenzin Dendup

unread,

Dec 24, 2008, 2:21:39 AM12/24/08

to tesser...@googlegroups.com

Thanks for all the help. the OCR worked for me. Changing the dpi to 300 helped in getting some output but the recognition was very low. Using the convert program to convert the gimp-created tiff images to pbm and then again using convert to change the pbm files back to tiff and then running tesseract made it work very well (Recognition was almost 100%) .

Once again, thanks all for the help.

Regards

74yrs old

unread,

Dec 24, 2008, 7:42:26 AM12/24/08

to tesser...@googlegroups.com

Ray,
Seehttp://code.google.com/p/tesseract-ocr/issues/detail?id=160Ray. not found with error message as follows:
"

Not Found

The requested URL /p/tesseract-ocr/issues/detail?id=160Ray was not found on this server.

On Wed, Dec 24, 2008 at 6:17 AM, Ray Smith <thera...@gmail.com> wrote:

http://code.google.com/p/tesseract-ocr/issues/detail?id=160

With regards,
-sriranga

Reply all

Reply to author

Forward

Tesseract newbie - No output from tesseract

Tenzin

Sascha Schimke

74yrs old

lab

Ray Smith

ABB

Ray Smith

Tenzin Dendup

74yrs old

Not Found