Generate a searchable pdf file in RightToLeft language

202 views
Skip to first unread message

Elishai Cohen

unread,
Jan 5, 2022, 1:31:01 PM1/5/22
to tesseract-ocr
Hi,

I'm focus on generate a searchable pdf file in Right to Left language (e.g. Hebrew and Arabic)

I'm working with python on ubuntu and windows.

while I'm using tesseract or pytesseract  I'm getting the results that are in the wrong orientation. (Left to right instead RTL)

should i add any language type or something else ? there is a another way to extract text in Alto xml or hocr and after that combine with the jpg file and create a searchable pdf file?

looking forward your advice,

thanks in advance,
Elishai

Zdenko Podobny

unread,
Jan 5, 2022, 2:53:31 PM1/5/22
to tesser...@googlegroups.com
Maybe you can start with this reading:

st 5. 1. 2022 o 19:30 Elishai Cohen <elisha...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/22c40308-4200-4f31-bd29-14cff1425c40n%40googlegroups.com.

Elishai Cohen

unread,
Jan 5, 2022, 3:02:25 PM1/5/22
to tesser...@googlegroups.com
Thanks.
I read it before but I saw some examples of searchable pdf files that were generated by tesseract.
I do not know what was the process so I'm asking here.

Thanks,
Elishai 


You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/5Xk0WcwCzwU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w3eM9%2B7Os2o0%2Bsis6VFMKjhFEoRwPPBZuv4Sct_7xXZg%40mail.gmail.com.

Anand babu

unread,
Dec 8, 2022, 7:10:51 AM12/8/22
to tesseract-ocr
Hi Elishai, Im working on a same project for project work to convert a scanned pdf to searchable PDF with colour coding. Could you please share some guidance on this?
Reply all
Reply to author
Forward
0 new messages