Generate a searchable pdf file in RightToLeft language

Elishai Cohen

unread,

Jan 5, 2022, 1:31:01 PM1/5/22

to tesseract-ocr

Hi,

I'm focus on generate a searchable pdf file in Right to Left language (e.g. Hebrew and Arabic)

I'm working with python on ubuntu and windows.

while I'm using tesseract or pytesseract I'm getting the results that are in the wrong orientation. (Left to right instead RTL)

should i add any language type or something else ? there is a another way to extract text in Alto xml or hocr and after that combine with the jpg file and create a searchable pdf file?

looking forward your advice,

thanks in advance,

Elishai

Zdenko Podobny

unread,

Jan 5, 2022, 2:53:31 PM1/5/22

to tesser...@googlegroups.com

Maybe you can start with this reading:

https://github.com/tesseract-ocr/tesseract/issues/238

Zdenko

st 5. 1. 2022 o 19:30 Elishai Cohen <elisha...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/22c40308-4200-4f31-bd29-14cff1425c40n%40googlegroups.com.

Elishai Cohen

unread,

Jan 5, 2022, 3:02:25 PM1/5/22

to tesser...@googlegroups.com

Thanks.

I read it before but I saw some examples of searchable pdf files that were generated by tesseract.

I do not know what was the process so I'm asking here.

Thanks,

Elishai

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/5Xk0WcwCzwU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w3eM9%2B7Os2o0%2Bsis6VFMKjhFEoRwPPBZuv4Sct_7xXZg%40mail.gmail.com.

Anand babu

unread,

Dec 8, 2022, 7:10:51 AM12/8/22

to tesseract-ocr

Hi Elishai, Im working on a same project for project work to convert a scanned pdf to searchable PDF with colour coding. Could you please share some guidance on this?

Reply all

Reply to author

Forward