Hello,
I ran into an assertion failure when I run tesseract on a scanned image. The output that I get is:
Page 1
Detected 146 diacritics
split_pt >0 && split_pt < word->chopped_word->NumBlobs():Error:Assert failed:in file ..\..\ccmain\tfacepp.cpp, line 186
I am testing on windows and the tesseract version is:
tesseract 3.04.02dev
leptonica-1.71 (Oct 21 2016, 18:04:17) [MSC v.1800 DLL Release x86]
libgif 4.1.6(?) : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.8
The image is a tif file, black & white, compressed with CCITT Group 4 Fax Encoding, resolution is 300 dpi, 2461 x 3478 pixels and the file size is 112K. Unfortunately I cannot attach the file.
The image is a form that includes some hand-written areas. If I redact the image with black boxes on top of every hand-written area, then tesseract is able to process the file without crashing. So my first thought was that the problem is in recognition of hand-writting.
However, I also tried resizing the original (unredacted) image to 1920 x 2714 and it also worked. So it seems like the hand-writting is no longer a problem, when the image is slightly smaller.
I am trying to use tesseract on an automated system that processes scanned images. Any ideas on how to resolve this?
Thank you very much
George