Traineddata files

154 views
Skip to first unread message

Philippe Argouarch

unread,
Feb 13, 2024, 12:51:35 AM2/13/24
to tesseract-ocr
What if there is no traineddata files for a language ? How do I start building a trained data file for the breton language ?

Tom Morris

unread,
Feb 14, 2024, 2:59:36 PM2/14/24
to tesseract-ocr
On Tuesday, February 13, 2024 at 12:51:35 AM UTC-5 argo...@gmail.com wrote:
What if there is no traineddata files for a language ? How do I start building a trained data file for the breton language ?

Searching the archives / group for "training from scratch" should turn up lots of previous discussions.

Tom 

Philippe Argouarch

unread,
Feb 19, 2024, 1:30:37 AM2/19/24
to tesser...@googlegroups.com
Thanks for answering
I found the breton tesseract data. My question now is why tesseract does not take PDF. Pdf are images no ?
regards
Philippe

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/36hXJgMNKRo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d36469ff-2952-4da3-a1cb-9ec342b571b1n%40googlegroups.com.

Tom Morris

unread,
Feb 19, 2024, 5:36:10 PM2/19/24
to tesseract-ocr
On Monday, February 19, 2024 at 1:30:37 AM UTC-5 argo...@gmail.com wrote:
... My question now is why tesseract does not take PDF. Pdf are images no ?

PDF files can contain text, graphics, images, or a mix of them all.

If you have PDF files that contain images, you can extract them using
utilities like Poppler's pdfimages. https://askubuntu.com/a/150106

Tom
Reply all
Reply to author
Forward
0 new messages