tess-two with tessdata_fast crashes

75 views
Skip to first unread message

NY C

unread,
Dec 7, 2019, 8:37:49 PM12/7/19
to tesseract-ocr

Hi, I am using tess-two for OCR.


(Alex Chon version : https://github.com/alexcohn/tess-two)


Code:

        TessBaseAPI baseApi = new TessBaseAPI();
        baseApi.setDebug(true);
        baseApi.init(pathfiles, language);
        //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
        baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
        baseApi.setImage(bmp);
        result= baseApi.getUTF8Text();
        baseApi.end();


The code run perfectly when I use this tessdata :https://github.com/tesseract-ocr/tessdata

But when I use tessdata_fast (https://github.com/tesseract-ocr/tessdata_fast), The code crashes on baseApi.init.


There is no error message since the init method calls native C++. As far as I can trace, the init method crashes on this line:

boolean success = nativeInitOem(mNativeData, datapath, language, ocrEngineMode);


I also tried to set the OEM like this: 

  baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);


All the OEM parameters have been tried :

(OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 2, OEM_DEFAULT = 3) 

Crashes as well.


How could I fix this?



Zdenko Podobny

unread,
Dec 8, 2019, 9:17:12 AM12/8/19
to tesser...@googlegroups.com
If you want to use API you need to spend some time with docs and source code.
You could fine out quite quickly that  CUBE  was removed from tesseract and is not available in version 4.
 
Zdenko


ne 8. 12. 2019 o 2:37 NY C <nyc...@gm.nkhs.tp.edu.tw> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/189fd3e5-4894-4a60-a6b3-480093d6f8ad%40googlegroups.com.

NY C

unread,
Dec 8, 2019, 10:37:37 AM12/8/19
to tesseract-ocr
I see.

However there are only 4 OEM parameters I can find in tess-two sorce code :

    @IntDef({OEM_TESSERACT_ONLY, OEM_CUBE_ONLY, OEM_TESSERACT_CUBE_COMBINED, OEM_DEFAULT})
    public @interface OcrEngineMode {}

    /** Run Tesseract only - fastest */
    public static final int OEM_TESSERACT_ONLY = 0;

    /** Run Cube only - better accuracy, but slower */
    @Deprecated
    public static final int OEM_CUBE_ONLY = 1;

    /** Run both and combine results - best accuracy */
    @Deprecated
    public static final int OEM_TESSERACT_CUBE_COMBINED = 2;

    /** Default OCR engine mode. */
    public static final int OEM_DEFAULT = 3;

I sincerely can not find a suitable OEM parameter. I don't think there is any other OEM parameter in tess-two.
(Again, the version I use is https://github.com/alexcohn/tess-two,  9.0.0)

Could you please give me some more tips.



zdenop於 2019年12月8日星期日 UTC+8下午10時17分12秒寫道:
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
Message has been deleted

Quan Nguyen

unread,
Dec 8, 2019, 11:38:56 AM12/8/19
to tesseract-ocr
There are new OcrEngineMode values.

NY C

unread,
Dec 9, 2019, 1:14:49 AM12/9/19
to tesseract-ocr
I know there are new OcrEngineMode value in Tesseract.
But not in tess-two.

In tesseract 4.x, ocrEngineMode is :

enum OcrEngineMode {
  OEM_TESSERACT_ONLY,           // Run Tesseract only - fastest; deprecated
  OEM_LSTM_ONLY,                // Run just the LSTM line recognizer.
  OEM_TESSERACT_LSTM_COMBINED,  // Run the LSTM recognizer, but allow fallback
                                // to Tesseract when things get difficult.
                                // deprecated
  OEM_DEFAULT,                  // Specify this mode when calling init_*(),
                                // to indicate that any of the above modes
                                // should be automatically inferred from the
                                // variables in the language-specific config,
                                // command-line configs, or if not specified
                                // in any of the above should be set to the
                                // default OEM_TESSERACT_ONLY.
  OEM_COUNT                     // Number of OEMs
};

However, in the newest release of tess-two, the ocrEngineMode is :

    @IntDef({OEM_TESSERACT_ONLY, OEM_CUBE_ONLY, OEM_TESSERACT_CUBE_COMBINED, OEM_DEFAULT})
    public @interface OcrEngineMode {}
    public static final int OEM_TESSERACT_ONLY = 0;
    @Deprecated
    public static final int OEM_CUBE_ONLY = 1;
    @Deprecated
    public static final int OEM_TESSERACT_CUBE_COMBINED = 2;
    public static final int OEM_DEFAULT = 3;

If there is no way to set OEM_LSTM_ONLY in tess-two,
I can only assume this is a bug in tess-two.



Quan Nguyen於 2019年12月9日星期一 UTC+8上午12時38分56秒寫道:

Quan Nguyen

unread,
Dec 9, 2019, 8:28:56 PM12/9/19
to tesseract-ocr
Not bug; just not up to date with Tesseract 4.x.

NY C

unread,
Dec 9, 2019, 9:23:35 PM12/9/19
to tesseract-ocr
I see, I admire the work of tess-two.
Which is very helpful for people developing Android application.
Hopefully, there will be able to use tess_fast and tess_best in tess-two.


Quan Nguyen於 2019年12月10日星期二 UTC+8上午9時28分56秒寫道:
Reply all
Reply to author
Forward
0 new messages