Hey all, quick question:
What does --noextract_font_properties do when using tesstrain.sh?
I've been using the flag for training since it's used in the training guide on GitHub. However, there I can't seem to find any usage information.
tesstrain.sh doesn't seem to include it in its usage info:
echo -e "USAGE: tesstrain.sh
--exposures EXPOSURES # A list of exposure levels to use (e.g. "-1 0 1").
--fontlist FONTS # A list of fontnames to train on.
--fonts_dir FONTS_PATH # Path to font files.
--lang LANG_CODE # ISO 639 code.
--langdata_dir DATADIR # Path to tesseract/training/langdata directory.
--linedata_only # Only generate training data for lstmtraining.
--output_dir OUTPUTDIR # Location of output traineddata file.
--overwrite # Safe to overwrite files in output_dir.
--run_shape_clustering # Run shape clustering (use for Indic langs).
--maxpages # Specify maximum pages to output (default:0=all)
--save_box_tiff # Save box/tiff pairs along with lstmf files.
--xsize # Specify width of output image (default:3600)
OPTIONAL flag for specifying directory with user specified box/tiff pairs.
Files should be named similar to ${LANG_CODE}.${fontname}.exp${EXPOSURE}.box/tif
--my_boxtiff_dir MY_BOXTIFF_DIR # Location of user specified box/tiff files.
OPTIONAL flags for input data. If unspecified we will look for them in
the langdata_dir directory.
--training_text TEXTFILE # Text to render and use for training.
--wordlist WORDFILE # Word list for the language ordered by
# decreasing frequency.
OPTIONAL flag to specify location of existing traineddata files, required
during feature extraction. If unspecified will use TESSDATA_PREFIX defined in
the current environment.
Thanks!