Steps to create traineddata for specific font

206 views
Skip to first unread message

Samruddhi Dhake

unread,
Nov 9, 2021, 6:20:12 AM11/9/21
to tesseract-ocr
Hello,

I am working Tesseract v4.1.1 on Windows 10.
I am trying to create trained data for specific font. 
Can anyone please mention steps to train for specific font? 
I know basic steps and able to create custom traineddata. For specific font, I am using tesstrain.sh. 
But I am facing many issues. Can anyone please guide me?

Regards,
Samruddhi

myne...@163.com

unread,
Nov 15, 2021, 1:39:45 AM11/15/21
to tesser...@googlegroups.com, Samruddhi Dhake
Hello,

I am encountering the same issue and awaiting for any feedback to it.

Could any body give us some guidance on it (training for specific font)?

many thanks!

Ant






-------- 原始邮件 --------
发件人: Samruddhi Dhake <sam22...@gmail.com>
日期: 2021年11月9日周二 19:20
收件人: tesseract-ocr <tesser...@googlegroups.com>
主 题: [tesseract-ocr] Steps to create traineddata for specific font
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c0b4d9c2-c3b4-4783-8327-f970f279d07bn%40googlegroups.com.

Osman M

unread,
Nov 22, 2021, 12:55:00 AM11/22/21
to tesseract-ocr
I've spent many hours trying to figure out how to do this, and went down many false paths. 

The apparent way to do this using documentation is called "Fine Tuning for Impact": https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#fine-tuning-for-impact , as tesstrain.sh is now deprecated:

training/lstmtraining --model_output /path/to/output [--max_image_MB 6000] \ --continue_from /path/to/existing/model \ --traineddata /path/to/original/traineddata \ [--perfect_sample_delay 0] [--debug_interval 0] \ [--max_iterations 0] [--target_error_rate 0.01] \ --train_listfile /path/to/list/of/filenames.txt  

However, it's not really clear where or how to include the new font's data. The manpage on Github for lstmtraining seems out-of-date and is not in accordance with its Linux usage guide (if you enter "lstmtraining" in the terminal). There is a --fonts_dir parameter mentioned on Linux, but I haven't tried it yet with my new font. I also don't know what the value to --train_listfile is supposed to be, since tesstraining.sh is deprecated and was the one to generate it.

Can someone from the Tesseract team please clarify this?

Thanks,

Osman
Reply all
Reply to author
Forward
0 new messages