vertical and horizontal text

279 views
Skip to first unread message

gv

unread,
Jan 5, 2022, 7:02:59 AM1/5/22
to tesseract-ocr
I have PDFs with vertical and horizontal texts. Some column-header are vertical, the column-header is printed from down to up.

I have good results for horizontal text. So I have rotated the PDF, to make the vertical text a horizontal text. Good results too.

But how do I combine both results into a single PDF? Is this possible to have horizontal and vertical text in a PDF?

Regards

Zdenko Podobny

unread,
Jan 5, 2022, 3:03:57 PM1/5/22
to tesser...@googlegroups.com
I am afraid tesseract is not able to do it (by itself). tesseract is OCR engine. What you describe is a quite complex solution:
  1. analyze input image -> detect regions and their features (text/table/pictures, graphics? What is its rotation to the page, other regions, etc...)
  2. OCR each region individually - this is area of "tesseract expertise"
  3. Generate output based on the information above
IMO for 1. a 3. you need to use other tools

Zdenko


st 5. 1. 2022 o 13:02 gv <sendmai...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0f687ff7-9a67-4ff3-86fa-1849cab4aeabn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages