Paragraph wise training

28 views
Skip to first unread message

Krishna Prasad

unread,
Jul 16, 2019, 1:09:26 AM7/16/19
to tesseract-ocr
Hi People,
     I am trying to retrain tesseract with https://www.primaresearch.org/repository/index/IMPACT_Digitisation

 As I read in the documentation, the input to retraining tesseract was a line ( an image of line of text with accompanying groundtruth) Is it possible for me to train using Paragraphs, as the dataset contains groundtruths only paragraph-wise? 

Will it help in increasing accuracy? Do you guys know of some tools to detect line in the paragraph? 

I think, if I use OpenCV Image processing to separate a paragraph to text lines, It would fail for some images. Please suggest me a better solution if possible. Thanks in advance.

Regards,
Krishna Prasad A S
Reply all
Reply to author
Forward
0 new messages