Text extraction

122 views
Skip to first unread message

Omar Sherif

unread,
Oct 11, 2024, 3:25:52 PM10/11/24
to tesseract-ocr
I want to extract the text in the attached image preserving the structure, but I didn't find something about that in documentation.
page-5.png

Zdenko Podobny

unread,
Oct 12, 2024, 4:17:29 AM10/12/24
to tesser...@googlegroups.com
Hello,

tesseract is the OCR engine, which can handle images with simple layouts like book pages.
For images with complex layouts (e.g. tables, a lot of graphics), you need to combine it with other tools for preprocessing (identifying text areas, removing graphics) and postprocessing  (layout reconstruction, coloring)

Zdenko


pi 11. 10. 2024 o 21:25 Omar Sherif <omar.sher...@gmail.com> napísal(a):
I want to extract the text in the attached image preserving the structure, but I didn't find something about that in documentation.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0fc4be00-02c5-441e-93ae-2dca7d2c1e8cn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages