Is this a bug? Should I report it?

75 views
Skip to first unread message

ch...@sc3.net

unread,
Sep 17, 2013, 2:14:57 PM9/17/13
to tesser...@googlegroups.com
Classify::InitAdaptiveClassifier() says its parameter (load_pre_trained_templates) "Should only be set to true if the necesary classifier components are present in the [lang].traineddata file." but when it's called from Tesseract::init_tesseract_internal() (via Wordrec::program_editup()) the parameter is set solely based on the value of tessedit_ocr_engine_mode, no checks are made to ensure the necessary classifier components exist.

I'm trying to OCR using two traineddata files - one built-in and one I'm creating, the one I'm creating doesn't have the classifier components so Classify::InitAdaptiveClassifier() is crashing on the line ASSERT_HOST(tessdata_manager.SeekToStart(TESSDATA_INTTEMP));

Is this a known problem, is there a known workaround?

Thanks,
Chris

Nick White

unread,
Sep 18, 2013, 7:49:02 AM9/18/13
to tesser...@googlegroups.com
Sounds like a bug to me. I'd suggest you write a patch adding proper
detection to that function, if you have the time. Otherwise just
file the bug and someone will probably get to it (but probably not
speedily).

Nick

zdenko podobny

unread,
Sep 18, 2013, 8:36:42 AM9/18/13
to tesser...@googlegroups.com
I don't think it is a bug - because root of problem is not in tesseract, but in ignoring instruction[1]. 
If somebody decides not to follow instruction/fulfill requirements[1], so he/she responsible for his/her troubles ;-)

Creating patch at this stage (new 3.03 alpha should be available soon) could be useless (training should be changed)....




Zdenko

Nick White

unread,
Sep 18, 2013, 10:47:17 AM9/18/13
to tesser...@googlegroups.com
On Wed, Sep 18, 2013 at 02:36:42PM +0200, zdenko podobny wrote:
> On Tue, Sep 17, 2013 at 11:14:57AM -0700, ch...@sc3.net wrote:
> > I'm trying to OCR using two traineddata files - one built-in and one I'm
> > creating, the one I'm creating doesn't have the classifier components so
> > Classify::InitAdaptiveClassifier() is crashing on the line ASSERT_HOST
> > (tessdata_manager.SeekToStart(TESSDATA_INTTEMP));
>
> I don't think it is a bug - because root of problem is not in tesseract, but in
> ignoring instruction[1].

My understanding was that Chris was intentionally creating a
training without the classifier components, for some reason. Is that
correct? If so, what's the usecase, out of curiousity?

Nick White

unread,
Sep 18, 2013, 1:15:39 PM9/18/13
to tesser...@googlegroups.com
On Wed, Sep 18, 2013 at 06:14:48PM +0200, zdenko podobny wrote:
> My understanding was that Chris was intentionally creating a
> training without the classifier components, for some reason.
>
> Me too. But than consequences (missing requirement => stop) should not be
> consider as bug ;-) Of course there could be some nice message error, but it
> did not change a picture for me.

At the very least it is a bug in the documentation. It states that
"Should only be set to true if the necessary classifier components
are present in the [lang].traineddata file", which is not how it
currently works.

Chris, can you explain why it's useful for you to have a training
without classifier components, please? If it isn't, we can just
correct the documentation, otherwise we should wait for the code
drop for the 3.03 series and then add better detection.

zdenko podobny

unread,
Sep 18, 2013, 12:14:48 PM9/18/13
to tesser...@googlegroups.com
Me too. But than consequences (missing requirement => stop) should not be consider as bug ;-) Of course there could be some nice message error, but it did not change a picture for me.
 
Is that

correct? If so, what's the usecase, out of curiousity?

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris Shearer Cooper

unread,
Sep 18, 2013, 6:25:56 PM9/18/13
to tesser...@googlegroups.com
There isn't a good use case for this, you are correct in assuming that I encountered this issue because of a bug in my code that was generating a bad .traineddata file.

By the way, I have entered this as issue #981, zdenko has already seen it.



--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

---
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/omJ0HT3TNkY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

zdenko podobny

unread,
Sep 19, 2013, 2:26:26 AM9/19/13
to tesser...@googlegroups.com
As far as I know 3.03 version should bring changes to training part of tesseract - so there is no reason any change in svn code at the moment...

Zdenko


You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Remon Georgy

unread,
Sep 19, 2013, 12:28:35 PM9/19/13
to tesser...@googlegroups.com
I'm so excited to hear there is a planned 3.03 version :) (Apologies! off-topic post) 
Reply all
Reply to author
Forward
0 new messages