Unsure why tesseract isn't returning the correct text

260 views
Skip to first unread message

DR

unread,
Apr 21, 2018, 4:09:30 AM4/21/18
to tesseract-ocr
I have this image I want to turn into text:

To clean it up, I've used Fred's textcleaner script (http://www.fmwconcepts.com/imagemagick/textcleaner/index.php) and ran  

./textcleaner -i 2 names.png result.png

on the image, the result is now:

It looks a lot cleaner, so now I use tesseract to turn it into text:

tesseract result.png stdout -psm 7 -l eng --user-words /path/to/eng.user-words --user-patterns /path/to/eng.user-patterns

with the following files,  eng.user-words:

BLAZIKEN
RAPIDASH
VICTREEBEL
SHARPEDO
PORYGON-Z
AZELF

eng.user-pattern:

-M
 
& /path/to/configs/bazaar:

load_system_dawg     F
load_freq_dawg       F
user_words_suffix    user-words
user_patterns_suffix user-patterns

Yet my output is:

BlHZIKEN-M RHPIDHSH-M VlETREEBEl-M SHHRPEIIIJ-M PURYEflN-Z-M HZELF-M 

Since case isn't an issue for me, the only problems are "A" showing up as "H", "C" showing up as "LE", "DO" showing up as "IIIJ", and "GO" showing up as "Efl" (with "fl" being one character).

I'm not sure how to make the image any clearer if possible or if I'm doing something wrong with tesseract. Any help is appreciated. 

ShreeDevi Kumar

unread,
Apr 21, 2018, 4:48:15 AM4/21/18
to tesser...@googlegroups.com

BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M

with

 tesseract -v
tesseract 4.0.0-beta.1-133-g5435c
 leptonica-1.76.0
  libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : libopenjp2 2.3.0
 Found AVX
 Found SSE

tesseract names.png - --tessdata-dir ./tessdata_best
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 547
BLAZIKEN-M RAPIDASH-M VICTREEBEL-M SHRRPEDO-M PORYGON-I-M  RAZELF-M


Which version of tesseract are you using?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cc3d86fb-4d9f-4e77-a5dd-23a41df213e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

DR

unread,
Apr 21, 2018, 4:20:01 PM4/21/18
to tesseract-ocr
I'm using:

tesseract 3.04.01
 leptonica-1.73
  libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Zdenko Podobny

unread,
Apr 21, 2018, 4:21:49 PM4/21/18
to tesser...@googlegroups.com
Time for upgrade?

Zdenko

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

DR

unread,
Apr 21, 2018, 4:25:28 PM4/21/18
to tesseract-ocr
Where can I find tesseract 4 beta? The github repo goes up to 4 alpha.
Time for upgrade?

Zdenko

Zdenko Podobny

unread,
Apr 21, 2018, 4:40:20 PM4/21/18
to tesser...@googlegroups.com
Really? Did you check it before writing to forum?

Zdenko

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

DR

unread,
Apr 21, 2018, 5:12:32 PM4/21/18
to tesseract-ocr
I double checked, there seems to be a 4.0.0-beta.1 tag. I assume you installed that using git?

Zdenko

ShreeDevi Kumar

unread,
Apr 22, 2018, 8:40:10 AM4/22/18
to tesser...@googlegroups.com
Yes, please use the latest code from github master branch for building. That way you will have all the bug fixes and updates.

Reply all
Reply to author
Forward
0 new messages