Training a new language to perform ocr on tesseract ?

130 views
Skip to first unread message

Kunal Athreya

unread,
Mar 14, 2023, 5:59:58 AM3/14/23
to tesseract-ocr
I have prepared the ocrd-testset.zip for the language I'm trying to train. But I'm unable to understand what the following line means on the tesstrain repository.

Tesseract expects some configuration data (a file radical-stroke.txt and *.unicharset for all scripts) in DATA_DIR.


What am I missing here  ? How can I train a model ?

Zdenko Podobny

unread,
Mar 24, 2023, 3:55:25 AM3/24/23
to tesser...@googlegroups.com
Did you follow instructions in https://github.com/tesseract-ocr/tesstrain#language-data ?

Zdenko


ut 14. 3. 2023 o 10:59 Kunal Athreya <kunalat...@gmail.com> napísal(a):
I have prepared the ocrd-testset.zip for the language I'm trying to train. But I'm unable to understand what the following line means on the tesstrain repository.

Tesseract expects some configuration data (a file radical-stroke.txt and *.unicharset for all scripts) in DATA_DIR.


What am I missing here  ? How can I train a model ?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/43e16269-b74b-46e7-8302-c15e2cf19ebfn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages