Configuration / Documentation

356 views
Skip to first unread message

troplin

unread,
Apr 13, 2012, 3:20:34 AM4/13/12
to tesser...@googlegroups.com
Hello,

is there any documentation about config files and configuration variables?
I am especially interested in a list of the most important/useful variables from a user point of view.

Regarding config files and API, is the "api_config" file still used or ist that just a relict from version 2?

troplin

Zdenko Podobný

unread,
Apr 13, 2012, 6:10:34 AM4/13/12
to tesser...@googlegroups.com
As far as I know only documentation for variables is in a source code.
Tom showed in docs for VS2008[1] how to list them easily on Windows (linux user will use find&grep).

I am not sure what are important/useful variables - it will depend on circumstances (e.g. *_debug_* variables)

"api_config" file (tessdata/configs/api_config)[2] is regular config file (e.g. useful if you run tesseract from command line), that set variable tessedit_zero_rejection[3] to true.

Zdenko

[1] http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/tools.html#id2
[2] http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/configs/api_config
[3] http://zdenop.github.com/tesseract-doc/classtesseract_1_1_tesseract.html#a8ad03214a06d9531a0dae0a80207baaf

troplin

unread,
Apr 16, 2012, 3:27:16 AM4/16/12
to tesser...@googlegroups.com, zde...@gmail.com
Thanks for the pointers.

In v 2.04, api_config was implicitly used by the TessDLL C Wrapper, so I assumed that it has a special role.
But this doesn't seem to be the case anymore.

Regarding the options
- most of them are really too low level to expose them to end users that want to tune the recognition for their documents.
- some of them are only for the suited for the command line tool, not the API (e.g. tessedit_create_hocr, in the API there's an explicit GetHOCRText call)
- some of them are useful and highlevel (e.g. tessedit_pageseg_mode) but I don't know exactly which ones.
- The inline documentation is mostly useless if you don't know the tesseract internals: e.g. BOOL_INIT_MEMBER(load_system_dawg, true, "Load system word dawg.",...

The background is, that we build a tesseract plugin for our software and now I want to document the most important configuration possibilities for our customers.
Reply all
Reply to author
Forward
0 new messages