Re: Where can I set tessedit_ocr_engine_mode for tesseract-ocr?

2,544 views
Skip to first unread message

Nick White

unread,
Jul 15, 2013, 10:38:00 AM7/15/13
to tesser...@googlegroups.com
Hi,

> I never set the tessedit_ocr_engine_mode
> configuration for tesseract, so I assume that it is using the default mode
> which, from my reading, will infer the best mode to use from the engine for the
> particular language.

You're right in your assumptions, it will use the default (non-cube)
mode unless you tell it otherwise. You're also correct that the
default mode is likely the best for Spanish.

> Finally, where can I set the tessedit_ocr_engine_mode? I cannot find this in
> any documentation online. Do I need to modify the source before compiling? Is
> there a configuration file that I can modify or add?

It's a configuration variable, which you set the same way as any
other configuration variable. That is documented a little here:
http://code.google.com/p/tesseract-ocr/wiki/ControlParams

I'm afraid I can't help you with performance, as I have no knowledge
of android stuff. You might find it useful to look at the code of
Renard's excellent looking Text Fairy app for android:
https://github.com/renard314/textfairy

Nick

bear

unread,
Jul 15, 2013, 11:42:57 AM7/15/13
to tesser...@googlegroups.com
 Thanks, Nick.  After poking through the source, it seems that one of my assumptions was incorrect; tesseract will default to the OEM_TESSERACT_ONLY mode, therefore it will not try to infer the best mode to use for individual languages (by default).

tesseractclass.cpp:

INT_INIT_MEMBER(tessedit_ocr_engine_mode, tesseract::OEM_TESSERACT_ONLY,
                    "Which OCR engine(s) to run (Tesseract, Cube, both)."
                    " Defaults to loading and running only Tesseract"
                    " (no Cube,no combiner)."
                    " Values from OcrEngineMode enum in tesseractclass.h)",
               this->params()),

Shree Devi Kumar

unread,
Jul 15, 2013, 1:23:07 PM7/15/13
to tesser...@googlegroups.com
You can unpack the traineddata file and take a look at the .config file in it.

eg. In case of hin.traineddata the config file uses combined mode - cube as well as OEM which makes it very slow. I changed the config value to use OEM only and recombined the file and that improved the speed.


Shree

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

bear

unread,
Jul 15, 2013, 1:53:57 PM7/15/13
to tesser...@googlegroups.com
I see, that was very helpful.  Thanks Shree.  I unpacked Arabic, and noticed the engine mode:

tessedit_ocr_engine_mode 1

I unpacked Spanish, and it did not contain an engine mode variable declaration.  Does that mean that it will default to using tesseract only (and not cube) as defined in my tesseractclass.cpp?  Or, will the absence of the variable from a language specific .config file default to something else?

Thanks again.

Nick White

unread,
Jul 15, 2013, 12:46:37 PM7/15/13
to tesser...@googlegroups.com
Oh yes, you're right, that's what I meant. I didn't read your
original question closely enough, sorry about that :p

Nick White

unread,
Jul 16, 2013, 7:03:21 AM7/16/13
to tesser...@googlegroups.com
Yes, if a variable isn't mentioned in the config file it will just
go to the default, which is as tesseract only.
Reply all
Reply to author
Forward
0 new messages