Is there any way to capture any type of formatting?

42 views
Skip to first unread message

DASSS

unread,
Jun 30, 2023, 1:29:43 AM6/30/23
to tesseract-ocr
Is there any way to gather info about the actual text formatting (markup, markdown), etc. 

Just curious, not expecting.

Screenshot_18.png

Lorenzo Bolzani

unread,
Jun 30, 2023, 6:07:57 AM6/30/23
to tesser...@googlegroups.com
Hi, if you use the API (tesserocr for example, with python) you'll get the boxes for each recognized letter/word. From there, with some programming, you can try to spot the "blanks".

Even with the command line there is the option to get the full document description, the hocr output if I remember correctly.



Bye

Lorenzo



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5e6879ce-9ce6-4749-897f-47d3c22f6760n%40googlegroups.com.

Zdenko Podobny

unread,
Jul 1, 2023, 6:21:26 AM7/1/23
to tesser...@googlegroups.com
have a look at hocr output.

Zdenko


pi 30. 6. 2023 o 7:29 DASSS <keeling...@gmail.com> napísal(a):
Is there any way to gather info about the actual text formatting (markup, markdown), etc. 

Just curious, not expecting.

Screenshot_18.png

--
Reply all
Reply to author
Forward
0 new messages