Fresh install not recognizing text like before

Votum V

unread,

Jan 4, 2020, 3:21:25 AM1/4/20

to tesseract-ocr

I've been using tesseract for a while now to read text from images that I take with a script for a game I am automating. I recently had to do a fresh install of my windows machine which included re-installing tesseract on my system.

The problem is that the same images that we're parsed and converted from an image to text are no longer being parsed correctly...

For example, one of the use cases here is taking a picture of a region in the game that contains a string like "Lv. 15", this worked fine before, but is now returning "Nn" or "Aa" pretty consistently...

I'm just curious if anyone else has experienced something like this? I'm quite lost on what to do about this because the software is also working for other people who use my script (their tesseract is parsing out the correct values).

Some information in case it helps...

System: Windows 10 x64

Tesseract Version:

tesseract v5.0.0-alpha.20190708
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE
Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

Shree Devi Kumar

unread,

Jan 4, 2020, 3:44:39 AM1/4/20

to tesseract-ocr

Please also provide tesseract version information from a machine where it is working.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/119cd802-5b83-49b9-afd5-27600aaa3678%40googlegroups.com.

--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Votum V

unread,

Jan 4, 2020, 9:26:07 AM1/4/20

to tesseract-ocr

I managed to find a user that has it working. They are on the same version as I am...

tesseract v5.0.0-alpha.20190708
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE
Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

On Saturday, January 4, 2020 at 4:44:39 AM UTC-4, shree wrote:

Please also provide tesseract version information from a machine where it is working.

On Sat, Jan 4, 2020 at 1:51 PM Votum V <theycal...@gmail.com> wrote:

I've been using tesseract for a while now to read text from images that I take with a script for a game I am automating. I recently had to do a fresh install of my windows machine which included re-installing tesseract on my system.

The problem is that the same images that we're parsed and converted from an image to text are no longer being parsed correctly...

For example, one of the use cases here is taking a picture of a region in the game that contains a string like "Lv. 15", this worked fine before, but is now returning "Nn" or "Aa" pretty consistently...

I'm just curious if anyone else has experienced something like this? I'm quite lost on what to do about this because the software is also working for other people who use my script (their tesseract is parsing out the correct values).

Some information in case it helps...

System: Windows 10 x64
Tesseract Version:
tesseract v5.0.0-alpha.20190708
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE
Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/119cd802-5b83-49b9-afd5-27600aaa3678%40googlegroups.com.

Shree Devi Kumar

unread,

Jan 4, 2020, 9:48:02 AM1/4/20

to tesseract-ocr

Please also check that the traineddata file being used is the same. You can compare filesizes as the name is the same.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ec46b676-2f74-4219-8450-2dcc85f0c31d%40googlegroups.com.

Zdenko Podobny

unread,

Jan 5, 2020, 2:41:15 AM1/5/20

to tesser...@googlegroups.com

Your tesseract version seems to be strange (should be tesseract 5.0.0-alpha-551-g99df, but instead of git revision you have date) How did you get it?

Please also provide input image for testing, command/code how you extract text from image and maybe more relevant information (e.g. with reinstalling tesseract you must change something...)

Without details that enable to replicate problem there is no way how to help you.

Zdenko

so 4. 1. 2020 o 15:26 Votum V <theycal...@gmail.com> napísal(a):

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ec46b676-2f74-4219-8450-2dcc85f0c31d%40googlegroups.com.

Shree Devi Kumar

unread,

Jan 5, 2020, 9:30:40 AM1/5/20

to tesseract-ocr, zdenko podobny

@zdenko podobny

It is probably the versioning for windows version from

https://digi.bib.uni-mannheim.de/tesseract/

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zoUq60fR0Q04UwDWFrdF9P4fDxFiGkun3QgMTXjN7bag%40mail.gmail.com.

Reply all

Reply to author

Forward