Tesseract works in debug, but fails in release

45 views
Skip to first unread message

Minseok Kim

unread,
Dec 31, 2020, 3:22:02 PM12/31/20
to tesseract-ocr
#include <iostream>
#include <string>
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int main()
{
    string outText = "", imPath = "image.jpeg";
    Mat im = cv::imread(imPath, cv::IMREAD_GRAYSCALE);
    cv::bitwise_not(im, im);
    cv::imwrite("image_inverted.jpeg", im);

    tesseract::TessBaseAPI* ocr = new tesseract::TessBaseAPI();

    if (ocr->Init(NULL, "Impact"))
    {
        cout << "Failed to initialize." << endl;
    }
    else
    {
        //ocr->SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOP");
        ocr->SetVariable("user_defined_dpi", "96");
        //ocr->SetVariable("unlv_tilde_crunching", "false");
        //ocr->SetImage(im.data, im.cols, im.rows, 1, im.cols);
        Pix* image = pixRead("image_inverted.jpeg");
        ocr->SetImage(image);

        cout << "test1" << endl;
        char* str = ocr->GetUTF8Text();
        outText = string(str);
        cout << outText << endl;
        cout << "test2" << endl;

        ocr->Clear();
        //ocr->End();

        delete ocr;
        if (str)
            delete[] str;
        pixDestroy(&image);
    }

    system("pause");
}

VS2019 - v142
Windows 10
vcpkg build of Tesseract (static libs) - 4.1.1#5

Zdenko Podobny

unread,
Dec 31, 2020, 4:31:41 PM12/31/20
to tesser...@googlegroups.com
I am not able to reproduce the problem - but I do not use vcpkg (so maybe there is problem):

1. I used official opencv for windows https://netix.dl.sourceforge.net/project/opencvlibrary/4.5.1/opencv-4.5.1-vc14_vc15.exe -> Installed to F:\opencv2
2. Because of using opencv2 I prefer to use "minimalistic tesseract" as described in https://spell.linux.sk/building-minimalistic-tesseract
4. If there any strange behaviour you should use official training data: e.g. I used https://github.com/tesseract-ocr/tessdata_best/blob/master/eng.traineddata
5. Modified code (tess_cv.cpp) looks like this:
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <opencv2/opencv.hpp>

int main() {
    std::string imPath = "image.jpeg";
    cv::Mat im = cv::imread(imPath, cv::IMREAD_GRAYSCALE);

    setMsgSeverity(9);  // turn off leptonica messages

    tesseract::TessBaseAPI* ocr = new tesseract::TessBaseAPI();

    if (ocr->Init(NULL, "eng")) {
        std::cout << "Failed to initialize." << std::endl;
    } else {

        ocr->SetVariable("user_defined_dpi", "96");
        ocr->SetImage(im.data, im.cols, im.rows, 1, im.cols);

        std::cout << "test1" << std::endl;

        char* str = ocr->GetUTF8Text();
        std::cout << str << std::endl;
        std::cout << "test2" << std::endl;

        ocr->Clear();

        ocr->End();

        delete ocr;
        if (str)
            delete[] str;
    }

    system("pause");
}


6. Compiled from command line with:
cl /EHsc tess_cv.cpp /If:\win64_msvc_min\include /If:\opencv2\opencv\build\include /link /LIBPATH:F:/WIN64_MSVC_MIN/LIB /LIBPATH:F:/WIN64_MSVC_MIN/LIB /LIBPATH:f:\opencv2\opencv\build\x64\vc15\lib\ tesseract50.lib leptonica-1.81.0.lib opencv_world451.lib /machine:x64 /out:tess_cv.exe

Output of  tess_cv.exe:
test1
Python3WebSpider

test2


8.  The statement using namespace std is generally considered bad practice and is known (in past) to cause compilation errors with tesseract.
 
Zdenko


št 31. 12. 2020 o 21:22 Minseok Kim <minseok....@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/774d1a41-1e6e-4bcd-951f-290e3b65e723n%40googlegroups.com.

Minseok Kim

unread,
Dec 31, 2020, 4:37:13 PM12/31/20
to tesseract-ocr
I don't know if this adds on anything, but maybe you have some insight in this.
I find that this works fine on more modern computer set-ups, but fails on older ones.
One of the systems I tried to run this code on is using an i5 2400 and it simply crashes even without an error.
Again, the debug version works on it, but the release build does not.

Zdenko Podobny

unread,
Dec 31, 2020, 4:47:27 PM12/31/20
to tesser...@googlegroups.com
I remember the opposite situation (on windows):debug was crasshit while release was ok.
I also remember some problems with static builds. try to build tesseract by yourself as described in link above  and use a shared library - it is not a big deal.
Maybe the problem is related with SSE/FMA/AVX/AVX2 support. AFAIK autodetection should be fixed in the latest 4.1 version, so I suggest to try the 5.alpha version where this kind of problems should be fixed.

Zdenko


št 31. 12. 2020 o 22:37 Minseok Kim <minseok....@gmail.com> napísal(a):

Minseok Kim

unread,
Dec 31, 2020, 5:31:36 PM12/31/20
to tesser...@googlegroups.com
Ok, thank you for the help.


You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/vlMHzti_4pc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xCKsU0pyKBuVsaEFAx04y5C-AmUi3XAQ%2Bx2zEarsNkxw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages