Problem with colored Tif Images

168 views
Skip to first unread message

Edson Luis Moretti

unread,
Mar 17, 2016, 12:27:55 PM3/17/16
to tesseract-ocr
Hello everyone!

I'm having problems with colored tiff images, as you can see in the file attached.

Trying to run > tesseract.exe 00000008.tif output pdf 
I get message boxes with these information:

unknow field with tag 512 (0x200) encountered
unknow field with tag 513 (0x201) encountered
unknow field with tag 514 (0x202) encountered
unknow field with tag 519 (0x207) encountered
unknow field with tag 520 (0x208) encountered
unknow field with tag 521 (0x209) encountered
Old-style JPEG compression support is not configured
Sorry, requested compression method is not configured

And sometimes, the PDF output is broken

With B&W Tiff's it works fine, and if I open these images on paint and save them as JPG also works fine.
I'm using version 3.0.4, from github > downloads > windows > 3 rd party exe installlers > binaries by @egorpugin (Link)

Anyone had this Issue? What could it be?

Thanks in advance!
Edson Luis Moretti.
00000008.tif

Tom Morris

unread,
Mar 18, 2016, 12:04:04 PM3/18/16
to tesseract-ocr
I doubt the problem is that their color, but rather that the producer software is creating bad TIFF files. It is possible to build libTIFF with so-called "Old style JPEG" (OJPEG) support, but it looks like it hasn't been in this case.

That file doesn't open in GMail's preview or OS X Preview app, so Tesseract isn't the only piece of software that's unhappy with the illegal TIFF. You might able to use ImageMagick or some other utility to convert them to an acceptedable format.

Tom 

Edson Luis Moretti

unread,
Mar 18, 2016, 12:57:36 PM3/18/16
to tesseract-ocr
Yes, I was wondering the same. When I did the post I saw that the GMail's preview didn't open the image.
 
I already did convert them to work with Tesseract but the original file has 353kb and converting them the output is like twice bigger.
I need the same or almost the same file size of these original ones, I tried to convert to JPEG or Tif again and compress these images in VB.Net but to get the same file size I lost a lot of quality.

Anyway, thanks for your answer, this problem has not to do with Tesseract.
I will try to find a way to convert/compress without loose too much quality.

Edson.

Robert Komar

unread,
Mar 21, 2016, 6:50:22 PM3/21/16
to tesseract-ocr
On Fri, 18 Mar 2016, Edson Luis Moretti wrote:

> Yes, I was wondering the same. When I did the post I saw
> that the GMail's preview didn't open the image. I already
> did convert them to work with Tesseract but the original
> file has 353kb and converting them the output is like
> twice bigger.
> I need the same or almost the same file size of these
> original ones, I tried to convert to JPEG or Tif again and
> compress these images in VB.Net but to get the same file
> size I lost a lot of quality.
>
> Anyway, thanks for your answer, this problem has not to do
> with Tesseract.
> I will try to find a way to convert/compress without loose
> too much quality.
>
> Edson.

Since tesseract converts to black and white internally,
you could convert your images to black and white yourself,
making them smaller. Another advantage to doing this is
that you would have control over how the image is binarized,
rather than relying on tesseract to do a good job of it.

Cheers,
Rob Komar
Reply all
Reply to author
Forward
0 new messages