Trained model works in command line but not bytedeco service?

24 views
Skip to first unread message

Adam Funk

unread,
Mar 6, 2020, 7:45:27 AM3/6/20
to tesser...@googlegroups.com
Hi,

I've downloaded some of the *.traineddata files from
<https://github.com/tesseract-ocr/tessdata_best> --- as far as I can
tell, all the ones I have tested work on the command line, e.g.,

$ tesseract --tessdata-dir /opt/data/tessdata-new/ --list-langs
...
swe
...

$ tesseract --tessdata-dir /opt/data/tessdata-new/ -l swe test.png stdout
[produces output with no errors]

$ tesseract --version
tesseract 4.1.0
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.37 :
libtiff 4.0.10 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE


but when I try to use the same swe.traineddata file in a web service
built with grails and running on Tomcat, something causes a segfault and
such a massive problem that the whole Tomcat server has to be killed and
restarted. The grails service has the following dependency:

compile group: 'org.bytedeco', name: 'tesseract-platform', version:
'4.0.0-1.5'

which is a slightly lower version, but the data files are supposed to
work with Tesseract 4.

Any ideas why?

Thanks,
Adam

Adam Funk

unread,
Mar 6, 2020, 8:57:55 AM3/6/20
to tesser...@googlegroups.com
Hi again,

I've updated the web service to use a newer version:

compile group: 'org.bytedeco', name: 'tesseract-platform', version:
'4.1.0-1.5.2'

It still segfaults when I try to use swe.traineddata but at least the
service recovers instead of dying in place.

Adam

Shree Devi Kumar

unread,
Mar 6, 2020, 9:01:17 AM3/6/20
to tesseract-ocr
The files from tessdata_best only support the lstm mode ie --oem 1. Please check what mode your web service is using.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b13d76da-24ec-633c-b281-29bbcf9cb0e0%40sheffield.ac.uk.

Adam Funk

unread,
Mar 6, 2020, 10:11:03 AM3/6/20
to tesser...@googlegroups.com

Thanks very much --- I think that was the problem.
> <mailto:tesseract-ocr%2Bunsu...@googlegroups.com>.
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-oc...@googlegroups.com
> <mailto:tesseract-oc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXsKysaRfCK3DtN0DqeFhiKOU4ZFPNHjoVJmbzGJ0PbpA%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXsKysaRfCK3DtN0DqeFhiKOU4ZFPNHjoVJmbzGJ0PbpA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages