Would like to contribute small change to expose --ptsize option to tesstrain.sh

116 views
Skip to first unread message

David Maung

unread,
Nov 1, 2019, 2:08:21 PM11/1/19
to tesseract-dev
Hello,

I am exploring training with different size fonts to see if it makes a difference in my situation.  To do this, I exposed the --ptsize option from text2image to tesstrain.sh.

text2image has an option of --ptsize which defaults to 12 if not set.  I made a small change to tesstrain_utils.sh to expose this option to tesstrain.sh.  It can then be used as follows:

src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain --ptsize 16

If this is of interest, how can I contribute the change?  The file differences are as follows:

@@ -44,6 +44,7 @@ RUN_SHAPE_CLUSTERING=false
 SAVE_BOX_TIFF=false
 WORKSPACE_DIR=$(mktemp -d)
 X_SIZE=3600
+PT_SIZE=12
 
 # set TESSDATA_PREFIX as empty, if not defined in environment to avoid an unbound variable
 TESSDATA_PREFIX=${TESSDATA_PREFIX:-}
@@ -153,6 +154,9 @@ parse_flags() {
             --maxpages)
                 parse_value "MAX_PAGES" ${ARGV[$j]:-}
                 i=$j ;;
+            --ptsize)
+                parse_value "PT_SIZE" ${ARGV[$j]:-}
+                i=$j ;;
             --my_boxtiff_dir)
                 parse_value "MY_BOXTIFF_DIR" ${ARGV[$j]:-}
                 i=$j ;;
@@ -244,7 +248,7 @@ initialize_fontconfig() {
     export FONT_CONFIG_CACHE
     local sample_path=${FONT_CONFIG_CACHE}/sample_text.txt
     echo "Text" >${sample_path}
-    run_command text2image --fonts_dir=${FONTS_DIR} \
+    run_command text2image --fonts_dir=${FONTS_DIR} --ptsize ${PT_SIZE} \
         --font="${FONTS[0]}" --outputbase=${sample_path} --text=${sample_path} \
         --fontconfig_tmpdir=${FONT_CONFIG_CACHE}
 }
@@ -275,7 +279,7 @@ generate_font_image() {
       fi
     done
 
-    run_command text2image ${common_args} --font="${font}" \
+    run_command text2image ${common_args} --font="${font}" --ptsize ${PT_SIZE} \
         --text=${TRAINING_TEXT}  ${TEXT2IMAGE_EXTRA_ARGS:-}
     check_file_readable ${outbase}.box ${outbase}.tif

Zdenko Podobny

unread,
Nov 1, 2019, 2:15:03 PM11/1/19
to tesser...@googlegroups.com
Can you create a pull request to  https://github.com/tesseract-ocr/tesseract ?

Zdenko


pi 1. 11. 2019 o 19:08 David Maung <davidm...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/acf626f4-8865-4b51-838e-4904e6134842%40googlegroups.com.

David Maung

unread,
Nov 1, 2019, 2:38:05 PM11/1/19
to tesseract-dev
I thought I need to upload my branch to create a pull request?  When I try and upload my branch, it errors with no write permission.

remote: Permission to tesseract-ocr/tesseract.git denied to DavidMaung.
fatal: unable to access 'https://github.com/tesseract-ocr/tesseract.git/': The requested URL returned error: 403

David
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Zdenko Podobny

unread,
Nov 1, 2019, 2:39:48 PM11/1/19
to tesser...@googlegroups.com
No. you need to fork tesseract on github and from there you can send pull request to official tesseract.

Zdenko


pi 1. 11. 2019 o 19:38 David Maung <davidm...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/7ae5d7c2-46cd-4067-9bc6-a708a3307179%40googlegroups.com.

David Maung

unread,
Nov 1, 2019, 3:30:00 PM11/1/19
to tesseract-dev
I did it!  Thank you.
Reply all
Reply to author
Forward
0 new messages