Tesseract-OCR giving different results for same image on different systems.

1,155 views
Skip to first unread message

adesh gautam

unread,
Dec 16, 2019, 7:28:37 AM12/16/19
to tesseract-ocr
Hi,

I am using tesseract-ocr on my images, and i am getting different results by running tesseract on different systems for same image. 
I am using pytesseract library.
I am setting the following parameters:
--psm 6  -c classify_enable_learning=0 -c classify_enable_adaptive_matcher=0

Images have dpi=300.
Tesseract version:
tesseract v5.0.0-alpha.20191030
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

OS:
Windows 10

Are there any system specific optimizations/dependencies by tesseract ? 


Shree Devi Kumar

unread,
Dec 16, 2019, 8:16:26 AM12/16/19
to tesseract-ocr
Run tesseract --version on the different systems.

Are thetraineddata files being used on the different systems the same?

Share an image and the different output received in each case.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/414947ab-b10a-40b8-8196-65a5bbbb3e1c%40googlegroups.com.

adesh gautam

unread,
Dec 16, 2019, 10:36:08 AM12/16/19
to tesseract-ocr


There is the same version of tesseract on the two systems as i mentioned before.

The trained data is also same, eng.traineddata 


These are the two images.

a.png

a.jpg


b.png

b.jpg


And these are the outputs for the same images on different systems.

System 1

 
a.jpg

['(A', '1, oy OFW ID CARD 2', 'Repubic of the Plasppines', 'o WJ, Department of Laor and Empioyment )', '(SSRee) Philippine Oversans Employment Admievstraticn,', 'ee ——', 'MARIA SANTOS DELA CRUZ', 'rn', '20911483', 'orion, 10', 'inde [0p {0)]', 'os', 'isle =', 'TUTTI ARG COMPANY [EI', 'dune 20, 2010']


b.jpg

['INTERNATIONAL STUDENT TY', 'EE', 'Ng ome']

 

System 2

 
a.jpg

['(~~', '% oy OFW ID CARD L', 'Nepubse of the Preappinas', '4 wi. Department of Labor and Employment »', 'Soe ac Pruippine Oversaas Employment Adminvstraion,', 'fp', 'MARIA SANTOS DELA CRUZ', 'si00an', '29911483', 'emt, 84', 'ine 1} 4 10)', 'mt', 'Mein Drbitue Bcd', '“werraci —ABo coMPany Oat', 'Si vue 90, 2010']

 
b.jpg

['DCL ae Pcl', 'R) arc orn', 'PN secret']


The output is different. 

Is it normal for tesseract ?


To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Shree Devi Kumar

unread,
Dec 17, 2019, 2:17:28 AM12/17/19
to tesseract-ocr
Please check file sizes for eng.traineddata - they maybe different versions even though they are called the same.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e2cf0580-e096-4b5f-80d8-5d609051f203%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

adesh gautam

unread,
Dec 17, 2019, 2:38:45 AM12/17/19
to tesseract-ocr
The file size of eng.traineddata is same - 3.92MB.

Shree Devi Kumar

unread,
Dec 17, 2019, 11:06:16 AM12/17/19
to tesseract-ocr
>There is the same version of tesseract on the two systems as i mentioned before.

OK. But is there any difference in specs of the 2 systems in terms of AVX etc. Hence tesseract -v would be useful.

Also, just check the results via CLI.

I get different results when using eng.traineddata from tessdata_best and tessdata_fast

ubuntu@tesseract-ocr:~/TEST$ tesseract unnamed.png - --tessdata-dir ~/tessdata_fast
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 195
OFW ID CARD

Republe of the Pieippings *
Department of Labor and Employment ae
Phiipploe Overseas Employment Admieistration,












MARIA SANTOS DELA CRUZ
29911483
‘enh

x Sess)

GIDE
ubuntu@tesseract-ocr:~/TEST$ tesseract unnamed.png - --tessdata-dir ~/tessdata_best
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 195
OFW ID CARD

Ropubkc of he Pisppines
Department of Labor and Employment "a
Phiipplaa Overseas Employment Admieistration












MARIA SANTOS DELA CRUZ
29911483
rn ve

hI [Op410)

[EI

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e0946afd-dbbe-41ea-9741-1bfadeff97f3%40googlegroups.com.

adesh gautam

unread,
Dec 19, 2019, 5:41:17 AM12/19/19
to tesseract-ocr
Both AVX and AVX2 are enabled on both the systems.

I am not using specific tessdata_fast or tessdata_best. I am using the default eng.traineddata that comes with windows installer.
Reply all
Reply to author
Forward
0 new messages