Scripts to generate langdata

44 views
Skip to first unread message

Sim Tov

unread,
Aug 16, 2021, 11:49:59 AM8/16/21
to tesser...@googlegroups.com
Hello,

I'm learning how to train tesseract for a new script and one of the stages is generating langdata.

I saw the examples here:


I can provide lang/lang.training_text and lang/lang.wordlist

What is the purpose of the rest of the files? E.g. lang.unicharambigs and lang.singles_text? Do they depend on lang/lang.training_text and lang/lang.wordlist and if yes - how do I generate them?

Thank you!

Sim Tov

unread,
Aug 17, 2021, 7:41:09 AM8/17/21
to tesser...@googlegroups.com
Are lang.unicharambigs and lang.singles_text language dependent or script dependent? Let's assume I want to train for Fraktur script for German language and I have langdata for regular German already - can I use deu.unicharambigs and deu.singles_text from there or should I generate them somehow depending on my Fraktur font?

Thank you!
Reply all
Reply to author
Forward
0 new messages