Tesseract Recognition using psm13 for charatcers like "t", "i", "j"

33 views
Skip to first unread message

Purushotham Rao Eravalli

unread,
Sep 30, 2019, 6:29:00 AM9/30/19
to tesseract-ocr
Hi,

I retrained tesseract with Calibiri, arial. While testing on the cropped text images I am facing issues where the characters "t", "i", "j" are all recognised as "l" adn sometimes "e" as "a". Does someone have solution for this.


Thanks,
Purushotham

Zdenko Podobny

unread,
Sep 30, 2019, 9:05:51 AM9/30/19
to tesser...@googlegroups.com
Can you provide testing images?
 I do not think there is any need to retrain  tesseract for common font like Arial.

Zdenko


po 30. 9. 2019 o 12:29 Purushotham Rao Eravalli <purus...@sukshi.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/577d1038-e809-42b4-8e3c-242e04f77d22%40googlegroups.com.

Purushotham Rao Eravalli

unread,
Sep 30, 2019, 9:11:45 AM9/30/19
to tesser...@googlegroups.com
Hi,
Please look at these images.


Thanks

5e07a43c069f76fcb85505f8dcda1721.jpg_front2-476-4.jpg
8aa8ea34feb16d5ee596e05fffe4c81f.jpg_front2-201-6.jpg

Purushotham Rao Eravalli

unread,
Sep 30, 2019, 9:12:39 AM9/30/19
to tesseract-ocr

8aa8ea34feb16d5ee596e05fffe4c81f.jpg_front2-201-6.jpg

5e07a43c069f76fcb85505f8dcda1721.jpg_front2-476-4.jpg

Zdenko Podobny

unread,
Sep 30, 2019, 10:04:48 AM9/30/19
to tesser...@googlegroups.com
>tesseract front2-201-6.jpg -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 148
Aimanam, Pulikkuttissery.

>tesseract front2-476-4.jpg -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 170
S/O: ltvari Lal, Village patti kuki.


>tesseract -v
tesseract 5.0.0-alpha-456-g021f
 leptonica-1.79.0 (Sep 16 2019, 13:25:21) [MSC v.1916 LIB Release x64]
  libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 libzstd/1.3.8


IMO 4.1 should produce the same result. I use model from tessdata_best.

Zdenko


po 30. 9. 2019 o 15:12 Purushotham Rao Eravalli <purus...@sukshi.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Purushotham Rao Eravalli

unread,
Sep 30, 2019, 10:08:05 AM9/30/19
to tesser...@googlegroups.com
Thank you very much. I will look into the versions and get back to you.


Reply all
Reply to author
Forward
0 new messages