Tesseract different output on windows then linux

76 views
Skip to first unread message

Chirs Masselli

unread,
Apr 7, 2019, 12:40:50 AM4/7/19
to tesseract-ocr
I created a python script in linux that uses tesseract and when running it everything works out and the output is correct when trying to run it on my windows computer tesseract is not identifying the numbers as numbers but instead as words attached is the same .png im using on both windows and linux as well as the corresponding outputs.
I understand that they are different version if that's the problem how do I download the newest versions for w10? as the newest download for windows is the one that I have (https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#400-alpha-for-windows).

version info for linux:
$ tesseract -v
tesseract 4.0.0-beta.1
 leptonica-1.76.0
  libjpeg 8 (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
 Found AVX2
 Found AVX
 Found SSE

version info for w10:
tesseract -v
tesseract 4.00.00alpha
 leptonica-1.74.1
  libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0

linuxout.txt
windows output.txt
samp.png

Shree Devi Kumar

unread,
Apr 7, 2019, 12:48:54 AM4/7/19
to tesser...@googlegroups.com
See https://github.com/UB-Mannheim/tesseract/wiki for latest windows installers

The difference you see could also be because of different version of traineddata file you are using - try with the ones from tessdata_best repo


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8809ffbd-d7ec-49c1-bc9a-4e5331fd256c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Chirs Masselli

unread,
Apr 7, 2019, 1:38:29 AM4/7/19
to tesseract-ocr
After making sure everything was updated I tried to use the eng.traineddata and now I'm getting the following assertion


lstm_recognizer_->DeSerialize(&fp):Error:Assert failed:in file ../../../../ccmain/tessedit.cpp, line 193


On Sunday, April 7, 2019 at 12:48:54 AM UTC-4, shree wrote:
See https://github.com/UB-Mannheim/tesseract/wiki for latest windows installers

The difference you see could also be because of different version of traineddata file you are using - try with the ones from tessdata_best repo


On Sun, Apr 7, 2019 at 10:10 AM Chirs Masselli <csp...@gmail.com> wrote:
I created a python script in linux that uses tesseract and when running it everything works out and the output is correct when trying to run it on my windows computer tesseract is not identifying the numbers as numbers but instead as words attached is the same .png im using on both windows and linux as well as the corresponding outputs.
I understand that they are different version if that's the problem how do I download the newest versions for w10? as the newest download for windows is the one that I have (https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#400-alpha-for-windows).

version info for linux:
$ tesseract -v
tesseract 4.0.0-beta.1
 leptonica-1.76.0
  libjpeg 8 (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
 Found AVX2
 Found AVX
 Found SSE

version info for w10:
tesseract -v
tesseract 4.00.00alpha
 leptonica-1.74.1
  libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Chirs Masselli

unread,
Apr 7, 2019, 1:49:05 AM4/7/19
to tesseract-ocr
After taking the eng.traineddata from my linux machine I am still getting the assertion same thing after using testdata file from my linux machine on my win10 machine

Shree Devi Kumar

unread,
Apr 7, 2019, 1:51:01 AM4/7/19
to tesser...@googlegroups.com
How did you get the traineddata? you need to usethe `raw` link.




To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

For more options, visit https://groups.google.com/d/optout.

Chirs Masselli

unread,
Apr 7, 2019, 2:03:08 AM4/7/19
to tesseract-ocr
Yes that's the exact one I used.

Shree Devi Kumar

unread,
Apr 7, 2019, 4:10:50 AM4/7/19
to tesser...@googlegroups.com
If problem is with the new windows version, please file issue at https://github.com/UB-Mannheim/tesseract/issues as the maintainers might not be checking this forum often.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

For more options, visit https://groups.google.com/d/optout.

stjo...@googlemail.com

unread,
Apr 7, 2019, 5:39:11 AM4/7/19
to tesseract-ocr
The assertion which you get was removed in commit dc8745e6fd4c6c070076c44565924faa0d0643a7 two years ago, so you are using an outdated version of Tesseract which is no longer supported.

Use `tesseract --version` to see the version of your installed Tesseract.

Chirs Masselli

unread,
Apr 7, 2019, 6:03:25 PM4/7/19
to tesseract-ocr
SOLVED
I solved it by downloading the 32 bit setup, it also fixed the bad recognition on windows vs linux without swithcing the train data

stjo...@googlemail.com

unread,
Apr 8, 2019, 5:24:38 AM4/8/19
to tesseract-ocr


On Monday, 8 April 2019 00:03:25 UTC+2, Chirs Masselli wrote:
SOLVED
I solved it by downloading the 32 bit setup, it also fixed the bad recognition on windows vs linux without swithcing the train data

Reply all
Reply to author
Forward
0 new messages