Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Any way to stop ocr after set time period?

31 views
Skip to first unread message

Ajg

unread,
Apr 3, 2025, 12:12:52 PMApr 3
to tesseract-ocr
I have an OCR program that tries to read and interpret many documents of different composition.  Some documents are pdfs that have an image as the first page with text on the second (or later) pages.   When processing, it can take several minutes or more  just to get past the first page of the pdf on the GetText() call when it is an image with little or no text on it.  The application is .net based on Winforms. Pdf Pages with lots of text work fine. 

The relevant code in c# is
var ocr = new TesseractEngine(..."tessdata5.2",
                                           "eng",
                                           EngineMode.LstmOnly);
using var page = ocr.Process(img, PageSegMode.AutoOsd);
ocrtext = page.GetText();   /* long time here */

img img = PixConverter.ToPix(save_bitmap);

I do need to collect text from subsequent pages for indexing documents.

Thanks in advance for any comments you may have. 

Zdenko Podobny

unread,
Apr 4, 2025, 1:46:12 AMApr 4
to tesser...@googlegroups.com
See comment in tesseract doc APIExample
There is also a function set_deadline_msecs.

I am not sure if this is exposed in c# wrapper.

Zdenko


št 3. 4. 2025 o 18:12 Ajg <ajg7...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/daff593f-01f3-4d09-acc4-a72ed39d4a98n%40googlegroups.com.

Ajg

unread,
Apr 4, 2025, 10:22:32 AMApr 4
to tesseract-ocr
Thanks for the tip.  I'll look into this
Reply all
Reply to author
Forward
0 new messages