What does --noextract_font_properties do?

25 views

Skip to first unread message

Timothy Snyder

unread,

May 15, 2019, 10:10:42 AM5/15/19

to tesseract-ocr

Hey all, quick question:

What does --noextract_font_properties do when using tesstrain.sh?

I've been using the flag for training since it's used in the training guide on GitHub. However, there I can't seem to find any usage information.

tesstrain.sh doesn't seem to include it in its usage info:

echo -e "USAGE: tesstrain.sh
     --exposures EXPOSURES      # A list of exposure levels to use (e.g. "-1 0 1").
     --fontlist FONTS           # A list of fontnames to train on.
     --fonts_dir FONTS_PATH     # Path to font files.
     --lang LANG_CODE           # ISO 639 code.
     --langdata_dir DATADIR     # Path to tesseract/training/langdata directory.
     --linedata_only            # Only generate training data for lstmtraining.
     --output_dir OUTPUTDIR     # Location of output traineddata file.
     --overwrite                # Safe to overwrite files in output_dir.
     --run_shape_clustering     # Run shape clustering (use for Indic langs).
     --maxpages                 # Specify maximum pages to output (default:0=all)
     --save_box_tiff            # Save box/tiff pairs along with lstmf files.
     --xsize                    # Specify width of output image (default:3600)

OPTIONAL flag for specifying directory with user specified box/tiff pairs.
Files should be named similar to ${LANG_CODE}.${fontname}.exp${EXPOSURE}.box/tif
     --my_boxtiff_dir MY_BOXTIFF_DIR # Location of user specified box/tiff files.

OPTIONAL flags for input data. If unspecified we will look for them in
the langdata_dir directory.
     --training_text TEXTFILE   # Text to render and use for training.
     --wordlist WORDFILE        # Word list for the language ordered by
                                # decreasing frequency.
OPTIONAL flag to specify location of existing traineddata files, required
during feature extraction. If unspecified will use TESSDATA_PREFIX defined in
the current environment.