Training a new language to perform ocr on tesseract ?

130 views

Skip to first unread message

Kunal Athreya

unread,

Mar 14, 2023, 5:59:58 AM3/14/23

to tesseract-ocr

I have prepared the ocrd-testset.zip for the language I'm trying to train. But I'm unable to understand what the following line means on the tesstrain repository.

Tesseract expects some configuration data (a file radical-stroke.txt and *.unicharset for all scripts) in DATA_DIR.

What am I missing here ? How can I train a model ?

Zdenko Podobny

unread,

Mar 24, 2023, 3:55:25 AM3/24/23

to tesser...@googlegroups.com

Did you follow instructions in https://github.com/tesseract-ocr/tesstrain#language-data ?

Zdenko

ut 14. 3. 2023 o 10:59 Kunal Athreya <kunalat...@gmail.com> napísal(a):

I have prepared the ocrd-testset.zip for the language I'm trying to train. But I'm unable to understand what the following line means on the tesstrain repository.

Tesseract expects some configuration data (a file radical-stroke.txt and *.unicharset for all scripts) in DATA_DIR.

What am I missing here ? How can I train a model ?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/43e16269-b74b-46e7-8302-c15e2cf19ebfn%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages