tiff activated/desactivated

5 views

Skip to first unread message

Sorbus

unread,

May 10, 2008, 2:50:26 PM5/10/08

to tesseract-ocr

Hello,
I obtain different results with tesseract according to the method of
compilation.
( Tests under Ubuntu 7.10 and Ubuntu 8.04)

1 °) If I compile tesseract while libtiff is installed.
- Files tiff compressed are recognized at once.
- But many other files are not recognized: the result ocr is illegible
(even after treatment "unpaper" through gscan2pdf)

2°) If I compile tesseract while libtiff is not installed.
- all the files are recognized after treatment "unpaper" through
gscan2pdf.

I put in attachment an example named " pag1.tif".
result with " tiff" activated : " result_tiff"
result with " tiff" activated after unpaper treatment: "
result_tiff_unpaper"
result with " tiff" desactivated after treatment unpaper: "
result_no_tiff_unpaper".

A friend (Claude) offers a new option : What do you think hereof ?

Claude (xcfaudio[at]gmail.com): "I changed tesseract 2.01 which now
takes into account a new parameter (number three):-lwt (Lang With
Tiff) and can be replaced by-l

This option when tesseract is compiled with the library to activate
TIFF (lwt-lang) or off (l-lang) the use of LibTiff

The new option-lang lwt can activate libtiff ...
... and must therefore allow tesseract to function properly on
compressed files tiff command line.

These modifications are made in the file:
cutil / globals.h
ccmain / tesseractmain.ccp
(Ref.: b52)

We believe that you could formally include these changes in Tesseract
(2.01 and 2.03)
I can send you the sources aves changes

Best regards "

Reply all

Reply to author

Forward

0 new messages