Fine tuning tesseract ocr

103 views
Skip to first unread message

Deepak Sharma

unread,
Jan 8, 2021, 8:48:56 AM1/8/21
to tesseract-ocr
I am trying to use Google's Tesseract OCR to extract text from documents. 

When I use Tesseract on documents which have a lot of colors in it, the OCR fails to perform well. (Please find attachment "image1.jpeg") 

So I am trying to fine tune the OCR with such sample documents as the "training data." So, can Fine-tuning the Tesseract OCR will really make an impact? 

1) If yes, how many such sample documents should be prepared from my end as "train data"
2) If no, What should be my approach to tackle the problem I am facing (Extracting text from "colored documents") or can I re-train the model from scratch??
3) Or do I find other OCR models? If yes, are there any OCR models that are best suited for my purpose?

DKATZ-UI-2016.pdf_0.jpeg
image1.jpeg.jpeg
Reply all
Reply to author
Forward
0 new messages