Training a new font on windows. Help with exact command.

97 views
Skip to first unread message

John S

unread,
Aug 1, 2024, 1:42:55 PM8/1/24
to tesseract-ocr
I'm attempting to train tesseract for a new english font. ve created a folder ./output comprising images, ground truth and box files. Does this folder need to be in the tesseract folder?
What is the exact command to perform the training? According to the documentation, its:
>>make training MODEL_NAME=name-of-the-resulting-model

However, the details are thin in the doc. For example how to specify the paths to the raw data, what to call MODEL_NAME if I'm just fine tuning and not creating a new language.

Menelik Berhan

unread,
Aug 30, 2024, 11:15:21 PM8/30/24
to tesseract-ocr
You can give MODEL_NAME any value.

For specifying the path to data directory use:
DATA_DIR Data directory for output files, proto model, start model, etc. Default: data

for example:
if MODEL_NAME=abc and DATA_DIR=data
you need to put the ground truth files (box, gt.txt & tif) in 'data/abc-ground-truth',

OR set value for GROUND_TRUTH_DIR directly.
(In both cases 'OUTPUT_DIR Output directory for generated files. Default: DATA_DIR/MODEL_NAME' will be 'data/abc')


Reply all
Reply to author
Forward
0 new messages