Process HOCR Content to generate Docx | Programmaticaly

96 views
Skip to first unread message

Suresh Kumar

unread,
Feb 21, 2021, 1:52:57 AM2/21/21
to tesseract-ocr
Team,

Currently i'm trying to process HOCR (XML parser) content to Docx(docx4j) in Java, for generating Docx file. 

is there any document, how i can process the HOCR data and transform into Docx?

Note: i'm looking to get bbox info of each ocr_line and trying to position the words  in docx.

i noticed this conversation, i want programmatic way of processing, so that i can process all OCR data effectively and generate formatted way of docx fie. 


Thanks,
Suresh Kumar M

Suresh Kumar

unread,
Feb 22, 2021, 3:29:34 PM2/22/21
to tesseract-ocr
Can anyone please help me on this.
Reply all
Reply to author
Forward
0 new messages