Tesseract 4.0.0 fails to extract some words from the attached form

64 views
Skip to first unread message

Russia Aiyappa

unread,
Feb 22, 2019, 3:39:40 PM2/22/19
to tesseract-ocr
Tesseract misses the extraction of some words like  "Monthly" and "Total" (under section V) in the attached form. Upon using the PRImA tools I found that "Monthly" was omitted as it wasn't segmented correctly while "Total" even though fell under the segmentation region wasn't extracted.

Any idea what could have caused such a behavior and how to fix this? I used PSM 3.

Thank you.
segmentatoin.PNG
form.tif

sachin chavan

unread,
Feb 28, 2019, 11:13:07 PM2/28/19
to tesser...@googlegroups.com
I'm also facing the same issue

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2df7c54f-bc66-482d-9f77-0fd65a6c2ae0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zdenko Podobny

unread,
Mar 1, 2019, 2:26:00 AM3/1/19
to tesser...@googlegroups.com
tesseract problem with OCR of tables is known problem - search archive and issue tracker.

Zdenko


pi 1. 3. 2019 o 5:13 sachin chavan <sac...@mollatech.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages