Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

1,263 views
Skip to first unread message

Hari.K

unread,
Jun 8, 2017, 9:55:04 AM6/8/17
to tesseract-ocr
Hi There,

    I sometimes receive an error - "Failed to create pix, this normally occurs because the requested image size is too large, please check Standard Error Output" when doing OCR on a bitmap image.


Below highlighted line is where it's breaking for me - 

 Bitmap bitmap;
Spire.Pdf.PdfDocument document = new Spire.Pdf.PdfDocument(pdfPath);


            for (int i = 0; i <= document.Pages.Count; i++)
            {
                bitmap = (Bitmap)document.SaveAsImage(i, PdfImageType.Bitmap, 200, 200); // where 200 is the DPI which I am setting for a bitmap image
                ...................
                .................

            }

More details on what I am trying to do here:
1) Uploaded a PDF document which is of hardly 600KB
2) Iterate through each PDF page and convert it into a BitMap image
3) Then input this BitMap image to Tesseract for performing OCR

Please note, I don't get this error often. Any ideas on why this error as I do not receive this every time ?

Looking forward for some inputs on this..

Thanks in Advance,
Hari


ShreeDevi Kumar

unread,
Jun 8, 2017, 11:46:15 AM6/8/17
to tesser...@googlegroups.com
Have you tried using ghostscript to convert pdf to tif files instead? Example commands

gs   -r600x600 -sDEVICE=tiffg4   -dFirstPage=106  -dLastPage=109    -o ./tulasi/tulasikrishna%00d.tif  "TulasiPuja.pdf"

for one tif per page

gs   -r600x600 -sDEVICE=tiffg4   -dFirstPage=126  -dLastPage=131    -o ./tulasi/tulasIviShNupUjA.tif  "TulasiPuja.pdf"

for multipage tif

you can reduce resolution to -r300x300

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/dcfe7918-707b-4b56-9720-b3e39ae1a658%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hari.K

unread,
Jun 9, 2017, 1:14:23 AM6/9/17
to tesseract-ocr
Thank you Shree for replying back on the issue. Yes I know about ghostscript and its commands, but with the present architecture of project we are restricted to acomodate the ghostscript commands. Besides, I am also aware of "gsdll32.dll", but as it is not a .Net managed library, and we can't reference it directly in a project and moreover we will have to go by the PInvoke procedure, hence for all those above reasons and limitations we are supposed to stay away from ghostscript.

Do you think we have any better alternative libraries which I can make use of so that I would not be getting that error which I mentioned in this post ?

Thanks in Advance,
Hari
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,
Jun 9, 2017, 2:29:08 AM6/9/17
to tesser...@googlegroups.com, Quan Nguyen
+ quan

Quan will be better able to advice regarding .net


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Hari.K

unread,
Jun 12, 2017, 1:55:44 AM6/12/17
to tesseract-ocr, nguy...@gmail.com
Thanks Shree.

Hello Quan,

Here are my further updates / observations on the post :

- The error which I had mentioned in this post is actually occurring in the below yellow highlighted line.
- As per my analysis, when a bitmap image is created newly, and if the image dimensions are exceeding 1900 x 2475, and in the next line when the same bitmap is being tried to convert to Pix then at that point of time, I am getting the error which I was talking about in the post. 


            for (int i = 0; i <= document.Pages.Count; i++)
            {
                bitmap = (Bitmap)document.SaveAsImage(i, PdfImageType.Bitmap, 200, 200);

                BitmapToPixConverter b = new BitmapToPixConverter();
                Pix pix = b.Convert(bitmap);
              .........
             }
So as per what I understand the Tesseract is not able to convert since the generated bitmap is of higher dimensions and it is throwing that error what we are talking about in the post. 

Is anyone sure that Tesseract has these kind of limitations while converting a bitmap of higher dimensions ?? 

Now, the only way to get rid of this issue is to resize the bitmap image before I try to convert it to Pix ? Am I in the right direction, any other ideas please ?

Thanks in Advance,
Hari

ShreeDevi Kumar

unread,
Jun 12, 2017, 3:17:27 AM6/12/17
to tesser...@googlegroups.com, dan bloomberg
image processing within tesseract is done by leptonica.


+ dan bloomberg



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Jun 12, 2017, 10:47:55 AM6/12/17
to Dan Bloomberg, tesser...@googlegroups.com, hari...@gmail.com
Thanks, Dan.

Forwarding your message to the group and original poster - who was getting errors with large bitmaps

>>when a bitmap image is created newly, and if the image dimensions are exceeding 1900 x 2475, and in the next line when the same bitmap is being tried to convert to Pix then at that point of time, I am getting the error which I was talking about in the post. 

On Mon, Jun 12, 2017 at 7:52 PM, Dan Bloomberg <dan.bl...@gmail.com> wrote:
  >>     BitmapToPixConverter b = new BitmapToPixConverter();
 
​>>​
   Pix pix = b.Convert(bitmap);

This is not leptonica code.​  It shouldn't compile, with b being a ptr that is dereferenced with a ".".  This is then set equal to a pix which is (as written) not a ptr either, causing a copy if it were correct.

ShreeDevi Kumar

unread,
Jun 12, 2017, 10:59:42 AM6/12/17
to Dan Bloomberg, tesser...@googlegroups.com, hari...@gmail.com
Hari,

Please also look in the leptonica program directory 
for 
pdf2tiff
pdf2mtiff
etc

THintz

unread,
Jun 12, 2017, 1:16:53 PM6/12/17
to tesseract-ocr, dan.bl...@gmail.com, hari...@gmail.com
That's charlesw's .Net Tesseract/Leptonica wrapper code.  One problem is that "pix" derives from IDisposable and must be disposed.

THintz

unread,
Jun 12, 2017, 1:31:34 PM6/12/17
to tesseract-ocr, dan.bl...@gmail.com, hari...@gmail.com
This is the Charles Weld .Net wrapper code.  The first thing Convert() does is call this method.  Leptonica's picCreate() returns a null pointer apparently.



       
public static Pix Create(int width, int height, int depth)
       
{
           
if (!AllowedDepths.Contains(depth))
               
throw new ArgumentException("Depth must be 1, 2, 4, 8, 16, or 32 bits.", "depth");


           
if (width <= 0) throw new ArgumentException("Width must be greater than zero", "width");
           
if (height <= 0) throw new ArgumentException("Height must be greater than zero", "height");


           
var handle = Interop.LeptonicaApi.Native.pixCreate(width, height, depth);
           
if (handle == IntPtr.Zero) throw new InvalidOperationException("Failed to create pix, this normally occurs because the requested image size is too large, please check Standard Error Output.");


           
return Create(handle);
       
}



Quan Nguyen

unread,
Jun 12, 2017, 3:39:09 PM6/12/17
to tesseract-ocr
Leptonica provides many different methods for creating Pix object. You can read from file, memory buffer, etc. So you may need to write your bitmap to such intermediate formats and read back as Pix.

pixRead
pixReadMem
pixReadMemPng

Check its API doc:

tdh...@gmail.com

unread,
Jun 12, 2017, 8:29:04 PM6/12/17
to tesser...@googlegroups.com
The more I think about this the more it makes sense it's just running out of memory because pix didn't get disposed.
Reply all
Reply to author
Forward
0 new messages