pytesseract - how to improve quality of text

112 views
Skip to first unread message

yoganand

unread,
Mar 22, 2019, 12:06:47 PM3/22/19
to tesseract-ocr
Hello,

Im building a OCR to read selected fields from invoices. i used tesseract, problems im facing are
1)not able to get table structures as is, atleast expecting a pipe symbol, which wil help in parsing text
2)few of characters were not extracted correctly. how to improve quality. does training tesseract4 helps?
3)why do you train tesseract4 additionally?
4)is there any option that i can use to get white spaces between words and text alignment as is in image after converting

i almost spent 1 mnth on this, could able to build ocr tool with a 40% accuracy

Shree Devi Kumar

unread,
Mar 22, 2019, 1:25:27 PM3/22/19
to tesser...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8ea1b021-5e96-43f4-a862-07da94eae9e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

kailash hambarde

unread,
Apr 1, 2019, 1:12:19 AM4/1/19
to tesseract-ocr
Same problem here, did you find the solution
Reply all
Reply to author
Forward
0 new messages