Hello,
I obtain different results with tesseract according to the method of
compilation.
( Tests under Ubuntu 7.10 and Ubuntu 8.04)
1 °) If I compile tesseract while libtiff is installed.
- Files tiff compressed are recognized at once.
- But many other files are not recognized: the result ocr is illegible
(even after treatment "unpaper" through gscan2pdf)
2°) If I compile tesseract while libtiff is not installed.
- all the files are recognized after treatment "unpaper" through
gscan2pdf.
I put in attachment an example named " pag1.tif".
result with " tiff" activated : " result_tiff"
result with " tiff" activated after unpaper treatment: "
result_tiff_unpaper"
result with " tiff" desactivated after treatment unpaper: "
result_no_tiff_unpaper".
A friend (Claude) offers a new option : What do you think hereof ?
Claude (xcfaudio[at]
gmail.com): "I changed tesseract 2.01 which now
takes into account a new parameter (number three):-lwt (Lang With
Tiff) and can be replaced by-l
This option when tesseract is compiled with the library to activate
TIFF (lwt-lang) or off (l-lang) the use of LibTiff
The new option-lang lwt can activate libtiff ...
... and must therefore allow tesseract to function properly on
compressed files tiff command line.
These modifications are made in the file:
cutil / globals.h
ccmain / tesseractmain.ccp
(Ref.: b52)
We believe that you could formally include these changes in Tesseract
(2.01 and 2.03)
I can send you the sources aves changes
Best regards "