Paragraph wise training

28 views

Skip to first unread message

Krishna Prasad

unread,

Jul 16, 2019, 1:09:26 AM7/16/19

to tesseract-ocr

Hi People,

I am trying to retrain tesseract with https://www.primaresearch.org/repository/index/IMPACT_Digitisation

As I read in the documentation, the input to retraining tesseract was a line ( an image of line of text with accompanying groundtruth) Is it possible for me to train using Paragraphs, as the dataset contains groundtruths only paragraph-wise?

Will it help in increasing accuracy? Do you guys know of some tools to detect line in the paragraph?

I think, if I use OpenCV Image processing to separate a paragraph to text lines, It would fail for some images. Please suggest me a better solution if possible. Thanks in advance.

Regards,

Krishna Prasad A S

Reply all

Reply to author

Forward

0 new messages