Cursive letters

45 views

Skip to first unread message

Leder Extreme BR

unread,

Apr 18, 2024, 1:07:46 AMApr 18

to tesseract-ocr

Hello, I'm testing tesseract and I'm not able to process texts that use cursive fonts. How do I proceed in this situation, should I train a model myself? If so, do you have a tip for me to do this? I'm new to using tesseract, please help me.

Yaofu Zhou

unread,

May 21, 2024, 2:05:52 PMMay 21

to tesseract-ocr

Yes, please take a look at Tesstrain, and particularly its Makefile, so that you know what is involved in the training process. I would go over the official documentation of Tesstrain and run "make help" to see the input needed. One of the items, among many, that you have not specified is the CNN-LSTM network specs, which you can ask GPT/Claude to explain to you.

Furthermore, you can use GPT or Claude to digest the Makefile for you so that you know what binaries are invoked during different steps of the training process. Once you find the binaries involved, you can do something like "lstmtraining --help" for each binary and check for the complete list of options, some of which are not specified in the Tesstrain Makefile.

Once you digest the Makefile of Tesstrain, it will become clear to you that, as messy as it may be, it is just an ugly wrapper to run various Tesseract binaries in sequence, which you can implement yourself. Then, you can (use GPT/Claude to) tailor the Makefile for you and even turn it into an equivalent Python script for easier modifications. This is almost certainly necessary if your training set is very large.

Reply all

Reply to author

Forward

0 new messages