Why does tessaract fail on this image?

114 views
Skip to first unread message

Tariq Ahmad

unread,
Jun 10, 2020, 12:50:38 PM6/10/20
to tesseract-ocr
I cannot understand whyTessaract fails on this (cropped) image:


Yet if i add a random white border it works:


Can anyone shed any light please?

Zdenko Podobny

unread,
Jun 11, 2020, 2:30:50 PM6/11/20
to tesser...@googlegroups.com

st 10. 6. 2020 o 18:50 'Tariq Ahmad' via tesseract-ocr <tesser...@googlegroups.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/280cee80-aad1-4245-8346-25d87d447730o%40googlegroups.com.

Tariq Ahmad

unread,
Jun 12, 2020, 6:31:42 AM6/12/20
to tesseract-ocr

Many thanks for your reply - useful to know.

I now find that pytesseract is returning the wrong coordinates for individual characters. For example, for this image (which has a 10pixel border):

image_to_boxes returns:

A: 17 32 10 22
L: 17 32 24 33
etc
etc

These I believe are interpreted as (left bottom right top) and when I extract the image for the letter A I get:


However, the same code works correctly for:


On Thursday, 11 June 2020 19:30:50 UTC+1, zdenop wrote:

st 10. 6. 2020 o 18:50 'Tariq Ahmad' via tesseract-ocr <tesser...@googlegroups.com> napísal(a):
I cannot understand whyTessaract fails on this (cropped) image:


Yet if i add a random white border it works:


Can anyone shed any light please?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Zdenko Podobny

unread,
Jun 12, 2020, 9:09:39 AM6/12/20
to tesser...@googlegroups.com
search for forum/issue tracker - there is explanation why LSTM can not exact character  box coordinates.
If you need exact  character  boxes IMO you need to use legacy engine (but it could have other problems)

Zdenko


pi 12. 6. 2020 o 12:31 'Tariq Ahmad' via tesseract-ocr <tesser...@googlegroups.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/53639a29-76a4-4917-8f74-743d48e1de77o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages