Re: inconsistent results from tesseract when the same TessBaseAPI object is used for decoding multiple images

Dmitri Silaev

unread,

Nov 15, 2012, 9:52:41 AM11/15/12

to tesser...@googlegroups.com

Hi Ganesh,

One of the things to try is clearing the adaptive classifier after every image or a number of images. It may get spoiled after some different documents, on the other hand for a few very similar documents it can bring some help. Well for a few similar pages it can get spoiled too ))

Warm regards,
Dmitri Silaev
www.CustomOCR.com

On Thu, Nov 15, 2012 at 12:14 PM, newtotesseract <bgan...@gmail.com> wrote:

Hi friends

I am using a static TessBaseAPI object in my application. This object gets initialized and reads, processes the training data at the startup of the application.

Then, this application processes multiple scanned images through the TESS_API TessBaseAPI::ProcessPages() function, using the same TessBaseAPI object over and over again.

I observed that the correctness of text decoded from images after some time reduces.

I could simulate this issue in tesseract.exe also by modifying tesseractmain.cpp file to have one TessBaseAPI object and process all the images using this same object.

Can you please guide me whether there are any modifications done to training data loaded in TessBaseAPI object?

Shall we not use different TessBaseAPI object for each image decoding? Is this a known issue?

Thanks in advance for your time and help.

Best Regards,
- ganesh

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Sriranga(78yrsold)

unread,

Nov 15, 2012, 11:15:24 AM11/15/12

to tesser...@googlegroups.com

Hi Dmitri,
what is the commandline for clearing the adaptive classifier after every image as well as after generating the traineddata file?.
With warmest regards,
-sriranga(79yrs)

Dmitri Silaev

unread,

Nov 15, 2012, 2:39:15 PM11/15/12

to tesser...@googlegroups.com

Sriranga,

All you can specify in the command line can be seen if you run tesseract's executable with no parameters. As you can see no trace of anything like adaptive classifier. Anyway you won't even need it as it's only an API routine for programmers and has no value when using the command line.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

Sriranga(78yrsold)

unread,

Nov 15, 2012, 9:27:56 PM11/15/12

to tesser...@googlegroups.com

Dmitri,
I am thankful to you for the clarification rendered to me.

In the meanwhile I am not still able to attain the accuracy in the output for Kannada lang -even used latest version. At present accuracy is 80%-85%(apprx).

With Warmest Regards,
-sriranga(79yrs)

zdenko podobny

unread,

Nov 16, 2012, 5:13:13 PM11/16/12

to tesser...@googlegroups.com

On Fri, Nov 16, 2012 at 2:29 AM, newtotesseract <bgan...@gmail.com> wrote:

Hi Dmitri,

How do we clear the adaptive classifier?

Can I please know, what is the API or function for clearing the adaptive classifier?

ClearAdaptiveClassifier[1] ;-)

[1] http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h?r=760#290

Best Regards,
- ganesh

xben...@gmail.com

unread,

Jan 7, 2016, 8:26:09 AM1/7/16

to tesseract-ocr

You can use API.ClearAdaptiveClassifier() function or if it is not available for some reason (Eg. You are using tess-two) :
API.SetVariable("classify_enable_learning","0");

API.SetVariable("classify_enable_adaptive_matcher","0");

Reply all

Reply to author

Forward