AdaptiveClassifierIsEmpty read-access violation

78 views
Skip to first unread message

Darren Morby

unread,
Sep 22, 2022, 2:42:02 AM9/22/22
to tesseract-ocr
This is in Tesseract 4.01.

I get a read-access violation in this function in classify.h:

  bool AdaptiveClassifierIsEmpty() const {
    return AdaptedTemplates->NumPermClasses == 0;
  }

This function does not check that AdaptedTemplates is nullptr or not nullptr.  It is being called by Tesseract::recog_all_words, which in turn is being called by TessBaseAPI::Recognize.  Is there a function that I should call to make sure that the Tesseract object is being initialized correctly?

I notice that only two functions actually make sure that AdaptedTemplates is not nullptr: InitForAnalysePage and FindLines().  Should I be calling one of these functions before Recognize?  (InitForAnalysePage curiously says that "Calls that attempt recognition will generate an error" but I don't see why.)

Thanks.

Zdenko Podobny

unread,
Sep 22, 2022, 2:15:53 PM9/22/22
to tesser...@googlegroups.com
Tesseract 4.x is an old and unsupported version.

So it would be nice if you could provide an example code with the public API that causes the read-access violation problem.
function AdaptiveClassifierIsEmpty is not part of the public API (https://github.com/tesseract-ocr/tesseract/tree/main/include/tesseract). 


Zdenko


št 22. 9. 2022 o 8:42 Darren Morby <darren...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/19c511d3-0544-469c-add3-a9ecea3efb68n%40googlegroups.com.

Darren Morby

unread,
Sep 23, 2022, 4:47:49 PM9/23/22
to tesseract-ocr
I said that the problem was in AdaptiveClassifierIsEmpty because Windows dumped the state of the process when the read-access violation occurred, and AdaptiveClassifierIsEmpty  was the currently-executing function at the top of the call stack.  This was deep within a call to the public function Recognize.

I have since found these problems:

1. During the Init call, if the eng.traineddata file is not found, then init_tesseract_lang_data called tprintf with an error message and then returned -1.  At this point, AdaptedTemplates is still a null pointer because its object is allocated later.  I believe that this is not a bug.
2. We were not getting the error message generated by tprintf.  We have our own version of tprintf that sprintf's to a string (which is sent to a searchable logging system) rather than fprintf's to stderr.  But our own version wasn't getting linked in so the error message was lost.  I believe that this is a bug on our side.
3. We were calling Init but not checking the return value.  It appears to have returned -1 and we ignored it and called Recognize anyway.  I believe that this is a bug on our side.

I have resolved the bugs in 2 and 3 and things seem to be normal again.

Thank you for your time.
Reply all
Reply to author
Forward
0 new messages