TessBaseAPI::ProcessPages can be stopped using ETEXT_DESC ?

138 views
Skip to first unread message

flavi...@gmail.com

unread,
Jun 21, 2022, 11:03:04 AM6/21/22
to tesseract-ocr
Win10, VS2017, MFC C++ application, Tesseract 4.1.1

I started TessBaseAPI::ProcessPage call, and I realized I couldn't stop it. I noticed the prototype of TessBaseAPI::ProcessPages, if the int timeout_millisec parameter is greater than 0, the library uses internally a ETEXT_DESC monitor:

// from TessBaseAPI source code:

bool TessBaseAPI::ProcessPage(Pix* pix, int page_index, const char* filename,
                              const char* retry_config, int timeout_millisec,
                              TessResultRenderer* renderer) {
....
    PageIterator* it = AnalyseLayout();

    if (it == nullptr) {
      failed = true;
    } else {
      delete it;
    }
  } else if (tesseract_->tessedit_pageseg_mode == PSM_OSD_ONLY) {
    failed = FindLines() != 0;
  } else if (timeout_millisec > 0) {
    // Running with a timeout.
    ETEXT_DESC monitor;
    monitor.cancel = nullptr;
    monitor.cancel_this = nullptr;
    monitor.set_deadline_msecs(timeout_millisec);

as you see, if timeout_millisec is greater than 0, some monitor is set up, however, not to stopping the process on demand, but for timout.

My question is: can be stopped TessBaseAPI::ProcessPages in a way, maybe using ETEXT_DESC ?

I have tried this (ReadImageThread is used as multi-thread, not as single thread):

void CMyDocEx::ReadImageThread(const CString& sSrcFile)
{
    do
    {
        std::shared_ptr<tesseract::TessBaseAPI> api = std::make_shared<tesseract::TessBaseAPI>();
        if (api->Init(CStringA(GetAppPathTemp()), "eng"))
        {
            break;
        }

        api->Recognize(&m_monitor);

        std::shared_ptr<tesseract::TessPDFRenderer> renderer =
                std::make_shared<tesseract::TessPDFRenderer>(
                CStringA(GetFileName(sSrcFile)), api->GetDatapath(), false);

        if (! api->ProcessPages(CStringA(sSrcFile), nullptr, 0, renderer.get()))
        {
            break;
        }
        api->End();
    } while (FALSE);
}

where m_monitor is defined in my CMyDocEx header:

class CMyDocEx : public CDocument
{
....
protected:
    static bool cancel(void* cancel_this, int words)
    {
        return m_bCancelFlag;
    }
....
protected:
    ETEXT_DESC m_monitor;
    static bool m_bCancelFlag;
....
}

and implementation file (cpp):

bool CMyDocEx::m_bCancelFlag = false;

CMyDocEx::CMyDocEx()
{
    // TODO: add one-time construction code here

    m_monitor.cancel = &CMyDocEx::cancel;
    m_monitor.cancel_this = reinterpret_cast<void*>(CMyDocEx::m_bCancelFlag);
}

and when I put m_bCancelFlag to true, seems to happen nothing ...

flavi...@gmail.com

unread,
Jun 22, 2022, 7:57:12 AM6/22/22
to tesseract-ocr
I also tried this (inside the thread method):

        ETEXT_DESC monitor;
        monitor.cancel = &CMyDocEx::cancel;
        monitor.cancel_this = this;
        api->Recognize(&monitor);


        std::shared_ptr<tesseract::TessPDFRenderer> renderer =
                std::make_shared<tesseract::TessPDFRenderer>(
                CStringA(sSrcFile), api->GetDatapath(), false);


        if (! api->ProcessPages(CStringA(sSrcFile), nullptr, 0, renderer.get()))
        {
            break;
        }

and

    // CMyDocEx header:
class CMyDocEx : public CDocument
{
.....

    static bool cancel(void* cancel_this, int words)
    {
        return true;
    }

Stopping api->ProcessPages it is a mission impossible ?

flavi...@gmail.com

unread,
Jun 22, 2022, 11:21:55 AM6/22/22
to tesseract-ocr
I think I've found the answer: it cannot be stopped api->ProcessPages.

I have tried the following code:

        api->SetPageSegMode(tesseract::PSM_AUTO);
        Pix* image = pixRead(CStringA(sSrcFile));
        api->SetImage(image);


        ETEXT_DESC monitor;
        monitor.cancel = &CMyDocEx::cancel;
        api->Recognize(&monitor);

        pixDestroy(&image);


With the following cancel method:

    static bool cancel(void* cancel_this, int words)
    {
        return m_bCancelFlag;
    }

Of course, I setup m_bCancelFlag at my Escape key (or whatever). And it stops properly api->SetImage, on demand. But this is not the case for api->ProcessPages. Is there a missing feature here ?

Zdenko Podobny

unread,
Jun 23, 2022, 5:48:57 AM6/23/22
to tesser...@googlegroups.com
I just tried this QT5 & tesseract example https://github.com/sashoalm/TesseractGui
and it cancel is working there:
TesseractGui_nVmZ16jgqG.gif

If you are interesting in progress bar have a look at this example:


Zdenko


st 22. 6. 2022 o 17:22 flavi...@gmail.com <flavi...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/50cad213-9883-45f7-8834-d29d4a8db96an%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages