Extraction error for PNG files - pixCreateNoInit: pix_malloc fail for data

42 views
Skip to first unread message

Gaurav Verma

unread,
Jul 28, 2022, 4:48:48 AMJul 28
to tesseract-ocr
Hi,
I am trying extracting text from some PNG images on windows Server 2019 Standard using Tesseract OCR 5.0.1 but getting some image validation errors.

Test Image 1 : Image1.png
Properties :      Dimensions 25500 x 44738
                          Width            25500 pixels
                          Height           44738 pixels
                          Bit depth       24
                          Size               42.4 MB

! Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser bad exit value 1 err msg: Error in pixCreateHeader: requested w = 25500, h = 44738, d = 32
! Error in pixCreateHeader: requested bytes >= 2^31
! Error in pixCreateNoInit: pixd not made
! Error in pixCreate: pixd not made
! Error in pixSetInputFormat: pix not defined
! Error in pixReadStreamJpeg: rowbuffer or pix not made
! Error in pixReadStream: jpeg: no pix returned
! Error in pixRead: pix not read
! Error during processing.


Test Image 2 : Image2.png
Properties :      Dimensions   35700 x 6599
                          Width               35700 pixels
                          Height             6599 pixels
                          Bit depth         24
                          Size                 50.6 MB
                          Resolution     608 DPI

! Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser bad exit value 1 err msg: Error in pixCreateNoInit: pix_malloc fail for data
! Error in pixCreate: pixd not made
! Error in pixReadStreamPng: pix not made
! Error in pixReadStream: png: no pix returned
! Error in pixRead: pix not read
! Error during processing.


Thanks in Advance for any help or hint.

- Gaurav

Zdenko Podobny

unread,
Aug 8, 2022, 3:24:59 PMAug 8
to tesser...@googlegroups.com
try the latest tesseract and letonica version - there were some improvement for big size images. 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bd4f257f-1303-4f11-9c3b-879e521c86fdn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages