Hi everyone,
I know this is quite an old topic by now, but this question still stands and I saw no reason to create a new one for it.
I use tesseract 3.0.2 (with leptonica 1.67, which was the recommended at the time of the installation) on Centos 6.5. I convert large pdf files to seperate page-PNGs, then use tesseract to scan for specific keywords.
A few pages have given me the following errors (the errors always come together):
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
These pages seem to be OCRed correctly, with more or less the same precision as the rest of the pages (~96% characters recognised), but I have only found three pages with these errors so my sample is not very significant.
What do these errors mean?
Do these hint to a user error?
Is there any possibility they can mean a loss of precision?
Thanks in advance for the help. :)