Hi,
I am trying extracting text from some PNG images on windows Server 2019 Standard using Tesseract OCR 5.0.1 but getting some image validation errors.
Test Image 1 : Image1.png
Properties : Dimensions 25500 x 44738
Width 25500 pixels
Height 44738 pixels
Bit depth 24
Size 42.4 MB
! Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser bad exit value 1 err msg: Error in pixCreateHeader: requested w = 25500, h = 44738, d = 32
! Error in pixCreateHeader: requested bytes >= 2^31
! Error in pixCreateNoInit: pixd not made
! Error in pixCreate: pixd not made
! Error in pixSetInputFormat: pix not defined
! Error in pixReadStreamJpeg: rowbuffer or pix not made
! Error in pixReadStream: jpeg: no pix returned
! Error in pixRead: pix not read
! Error during processing.
Test Image 2 : Image2.png
Properties : Dimensions 35700 x 6599
Width 35700 pixels
Height 6599 pixels
Bit depth 24
Size 50.6 MB
Resolution 608 DPI
! Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser bad exit value 1 err msg: Error in pixCreateNoInit: pix_malloc fail for data
! Error in pixCreate: pixd not made
! Error in pixReadStreamPng: pix not made
! Error in pixReadStream: png: no pix returned
! Error in pixRead: pix not read
! Error during processing.
Thanks in Advance for any help or hint.
- Gaurav