How to assess the quality of Tesseract OCR output programmatically?

171 views
Skip to first unread message

nitin

unread,
Jun 13, 2018, 3:11:24 AM6/13/18
to tesseract-ocr
Hi Dear members,

Is there a way to 'assess the quality of Tesseract OCR output'?

I need to provide such statistics along with the scanned image-to-pdf output file results, 
so the users can decide and sort whether the out-put quality is acceptable or not (like above 50%....80% recognition done successfully).
Also I need to determine this programmatically.

Thanks for your time.
Regards

ShreeDevi Kumar

unread,
Jun 13, 2018, 4:24:51 AM6/13/18
to tesser...@googlegroups.com
You can compare OCRed text with groundtruth text. If creating pdf, you will have to extract text from it to compare.

There are two options:


or


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/478eb151-63e2-4ac5-b9ba-4d0ec1498076%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages