1 Bit or Grayscale?

1,108 views
Skip to first unread message

NA

unread,
Apr 4, 2008, 10:38:46 AM4/4/08
to tesseract-ocr
I tried to OCR a color TIFF with Tesseract but it didn't work. I
resampled it to grayscale using imagemagick and that works as does
resampling it to 1bpp. What will provide better results for color
images, converting them to grayscale or to black and white (1 bit per
pixel)?

Hussein

unread,
Apr 4, 2008, 10:58:33 AM4/4/08
to tesser...@googlegroups.com
It depends on who does better binarization (thresholding) from grayscale (8 bpp) to binary (1 bpp);  if your preprocessing does a better job than tesseract, then do it; if tesseract does a better job then pass it as a grayscale.  You can decide that by simple tests of both cases.
 
I myself never want a client to pass me binarized images (1 bpp) in any image processor application (OCR, 2D Barcode, image detection, etc) because I lose a lot of information and I have to build on their mistakes if I accept their binairzed images.
 
Hussein Al-Hussein






> Date: Fri, 4 Apr 2008 07:38:46 -0700
> Subject: 1 Bit or Grayscale?
> From: NAp...@gmail.com
> To: tesser...@googlegroups.com

NA

unread,
Apr 4, 2008, 1:25:46 PM4/4/08
to tesseract-ocr
So Tesseract binarizes the grayscale images before processing them.
Ok, so who would you assume does a better job, imagemagick or
tesseract? I don't really have the data to test this.

On Apr 4, 10:58 am, Hussein <al_om...@hotmail.com> wrote:
> It depends on who does better binarization (thresholding) from grayscale (8 bpp) to binary (1 bpp);  if your preprocessing does a better job than tesseract, then do it; if tesseract does a better job then pass it as a grayscale.  You can decide that by simple tests of both cases.
>
> I myself never want a client to pass me binarized images (1 bpp) in any image processor application (OCR, 2D Barcode, image detection, etc) because I lose a lot of information and I have to build on their mistakes if I accept their binairzed images.
>
> Hussein Al-Hussein
>
>
>
> > Date: Fri, 4 Apr 2008 07:38:46 -0700> Subject: 1 Bit or Grayscale?> From: NAp...@gmail.com> To: tesser...@googlegroups.com> > > I tried to OCR a color TIFF with Tesseract but it didn't work. I> resampled it to grayscale using imagemagick and that works as does> resampling it to 1bpp. What will provide better results for color> images, converting them to grayscale or to black and white (1 bit per> pixel)?> >- Hide quoted text -
>
> - Show quoted text -

Hussein

unread,
Apr 4, 2008, 1:42:14 PM4/4/08
to tesser...@googlegroups.com
Others who are more experts on tesseract can help you better; I am an OCR expert but have my own; I do not use the tesseract but I compare with it sometimes.
 
Like I said, you have to try few pages on tesseract without converting them to binary.  Then convert them to binary using your tool (you said you use imagemagick) and then call tesseract on the binarized images.  Compare these results to see which one does a better job for your particular images.  The result my change for images with a different scanning quality.

Hussein





> Date: Fri, 4 Apr 2008 10:25:46 -0700
> Subject: Re: 1 Bit or Grayscale?

Scan...@gmail.com

unread,
Apr 4, 2008, 3:00:32 PM4/4/08
to tesseract-ocr
Tess does adaptive threshold of gray scale images and there is at
least code for color images in it. Why your color image will not work,
I am not sure. Might be Libtiff not enable (see Wiki). Adaptive
threshold is better than a straight conversion to 1bpp because it get
the best threshold to achieve optimal recognition. Adaptive
thresholding algorithms can even be written to vary the threshold
region by region or character by character but I don't think Tess does
it down to this level.

On Apr 4, 1:42 pm, Hussein <al_om...@hotmail.com> wrote:
> Others who are more experts on tesseract can help you better; I am an OCR expert but have my own; I do not use the tesseract but I compare with it sometimes.
>
> Like I said, you have to try few pages on tesseract without converting them to binary.  Then convert them to binary using your tool (you said you use imagemagick) and then call tesseract on the binarized images.  Compare these results to see which one does a better job for your particular images.  The result my change for images with a different scanning quality.
> Hussein
>
Reply all
Reply to author
Forward
0 new messages