have the width, height, of each character of an image pdf file

39 views
Skip to first unread message

G. S.

unread,
May 31, 2019, 9:37:15 AM5/31/19
to tesseract-ocr
Dear all,

i have a pdf image file, (in Greek language)

i would appreciate if you could help me on how i could 

a) have an output similar to what pdf alto does, 

but more important, have the position width and height info in a per character base.

Up to now, pdfalto considers each word to be a token, so the output is on a per word base.



Please tell me how would you approach this with 


which command and which parameters you would use?

thank you very much in advance

Shree Devi Kumar

unread,
May 31, 2019, 11:31:07 AM5/31/19
to tesser...@googlegroups.com
I think the hocr output has an option to output bounding info per character also. 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/32091990-88b9-426d-94f0-2c5278a9b9da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shree Devi Kumar

unread,
Jun 1, 2019, 8:43:26 AM6/1/19
to tesser...@googlegroups.com
--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Reply all
Reply to author
Forward
0 new messages