I used the official Tesseract 5.0 alpha build for 64-bits under Windows to do this test. The document is a single page TIFF image of a noisy engineering drawing. Using segmentation mode 6, the file was processed in 30 minutes. I tried mode 11 to look for sparse text next. The processing time increased to over one hour.
Normally, I wouldn't attempt to OCR a file like this. However, we have a project that has a large number of scanned images and it is impractical to examine files individually.
Is there a way to set a timeout or get some preliminary data during segmentation so that we can detect and skip such noisy files?
Also, when we run this file in a custom program with Tesseract monitor class enabled, the engine gets to 100% progress in 10 to 20 minutes but then gets stuck there, presumably trying to format the results list. It detects something like 30,000+ symbols, mostly non-words.
Please note that the attachment is only a screenshot due to copyright issues. The actual file is about 3.5MB TIFF G4 compressed.