tesseract does not extract table contents properly

70 views
Skip to first unread message

Manasi sarode

unread,
May 30, 2019, 9:56:24 AM5/30/19
to tesseract-ocr
I'm trying to solve the query of table content detection by using Tesseract, but its not giving accurate results in that is Some of the contents are missing. Also, if I can get any function/api for table content extraction,
Observations in attached screenshots:- 

1)Mombai is detected as Mombal
2)Rawalpindi is detected as Rawalpind!
3)There are spaces before and after spaces.
4)There are underscore(_) after the numbers of second column.

tbl.PNG
tbl2.PNG

Zdenko Podobny

unread,
May 30, 2019, 11:04:02 AM5/30/19
to tesser...@googlegroups.com

št 30. 5. 2019 o 11:56 Manasi sarode <manasi.s...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/928c8f45-7759-43f6-950f-ffeecd065d6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages