Question : can I force Tesseract to follow an existing layout?

Vincent Sarbach-Pulicani

unread,

Sep 23, 2022, 11:20:23 AM9/23/22

to tesseract-ocr

Hello,

I'm working on historical newspaper from the interwar period written in 3 different languages : corsican, french and italian.

After many tries, Tesseract seems to be the best OCR for me but the layout analysis of a newspaper is complex.

However, using the API of Gallica (French national library), I can have access to an OCR (bad quality) and usable ALTO files.

My question is : can I use those ALTO files to make Tesseract follow the same segmentation as the basic OCR?

I don't know if my question makes sense.

Thanks a lot,

Vincent Sarbach-Pulicani

Zdenko Podobny

unread,

Sep 23, 2022, 12:44:12 PM9/23/22

to tesser...@googlegroups.com

Tesseract support uzn file[1] with psm 4. Seach forum for more details

[1] https://github.com/OpenGreekAndLatin/greek-dev/wiki/uzn-format

Zdenko

pi 23. 9. 2022 o 17:20 Vincent Sarbach-Pulicani <ldar...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/334be2c9-a194-46ee-adcb-ab48b712e3b8n%40googlegroups.com.

Vincent Sarbach-Pulicani

unread,

Sep 23, 2022, 12:56:33 PM9/23/22

to tesser...@googlegroups.com

Ok, I'll check that, thanks again.

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/YUVzqWDpM4I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z22bwiE2JEsq4kHn9xoFTsMw%2BdyS70pO9aS4%2BwaO%2BOaw%40mail.gmail.com.

Reply all

Reply to author

Forward