> Do you know if tesseract reads images of 1 bit/pixel (binarized and
> compacted)? Your images are 1 bit/pixel. If tesseract expects 8 or 24
> bits/pixel then open one image using paint on Windows and save it (SAVE AS)
> BMP or JPG with 256 colors (8 bits/pixel) and run tesseract on the result to
> see if it will work.
> > I used identify on one of the faxes.
> > identify fax000130161.tif
> > fax000130161.tif[0] TIFF 1728x2148 DirectClass 137kb
> > fax000130161.tif[1] TIFF 1728x2148 DirectClass 137kb
> > fax000130161.tif[2] TIFF 1728x2148 DirectClass 137kb
> >
> > If you could tell me what to look for, what is best resolution or some
> > specific property of that file I should be passing to tesseract.
> > Are they too big, small?
> >
> > I assume I would be able to convert them using convert from
> > imagemagic, so If you could also provide me with the command line
> > arguments that would be great.
> >
> > convert --resolution --some other arguments filename.tiff
I just ocr my whole directory and results very a lot.
for i in *.tif; do tesseract $i $i; done
Some pages are near perfect some are really bad.
What is the best / optimal file resolution preprocessed files should have?
How does tesseract deal with 3 page tif files? Maybe 60% of my images
are multi page, but only the first page gets converted?
What would be the imagemagic command to convert 3 page image to 1
page? Or is there a command for tesseract that would tell it to scan
all pages?
Lucas
Is 204 not enough?
Also I tried resizing Resolution: 204x98 to 300x300 but that wasn't
quite readable. Does aspect ratio deal with resolution or is that a
seperate thing? I am thinking that maybe the 204x98 should be
resampled to 300x150 or something like that?
Since I didn't get an answer I assume tesseract doesn't need training
for eng language since it is prebuild with it. correct?
Lucas
> This may be the reason for poor results. Also, you would want to split
> each of your multiple page files to single pages and then merge the
> text results back together. Check out lib tiff for functions.
I will see if "ImageMagic" is able to do that. I think there should be
tool available to do that.
--
--
Paper Less?
http://lucasmanual.com/mywiki/ImageManagement
To get reasonable results, the resolution should be 300dpi in both
directions. I use 400dpi. With 204x98, there is not much you can do,
as the information is already lost. Resampling to 300dpi will just
make the image larger without adding back the lost information.
Regards
Jeff