Question : can I force Tesseract to follow an existing layout?

55 views
Skip to first unread message

Vincent Sarbach-Pulicani

unread,
Sep 23, 2022, 11:20:23 AM9/23/22
to tesseract-ocr
Hello,
I'm working on historical newspaper from the interwar period written in 3 different languages : corsican, french and italian.
After many tries, Tesseract seems to be the best OCR for me but the layout analysis of a newspaper is complex.
However, using the API of Gallica (French national library), I can have access to an OCR (bad quality) and usable ALTO files.
My question is : can I use those ALTO files to make Tesseract follow the same segmentation as the basic OCR?
I don't know if my question makes sense.
Thanks a lot,
Vincent Sarbach-Pulicani

Zdenko Podobny

unread,
Sep 23, 2022, 12:44:12 PM9/23/22
to tesser...@googlegroups.com
Tesseract support uzn file[1] with psm 4. Seach forum for more details


pi 23. 9. 2022 o 17:20 Vincent Sarbach-Pulicani <ldar...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/334be2c9-a194-46ee-adcb-ab48b712e3b8n%40googlegroups.com.

Vincent Sarbach-Pulicani

unread,
Sep 23, 2022, 12:56:33 PM9/23/22
to tesser...@googlegroups.com
Ok, I'll check that, thanks again.

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/YUVzqWDpM4I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z22bwiE2JEsq4kHn9xoFTsMw%2BdyS70pO9aS4%2BwaO%2BOaw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages