--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4011df1b-a0cc-46bc-ba9f-e6d6b7f62d64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, May 14, 2018 at 1:52 PM, reza <reza...@gmail.com> wrote:
hii tested tesseract 4 beta on persian lang , the results was good. but i think needs more training on more fonts and texts.how could we train more fonts and texts on model that exist in tesseract 4 beta for persian lang ?and last question is, how could we apply dictionary to correct that words OCRing with error ?thanks
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
###### MAKING TRAINING DATA ######=== Starting training for language 'eng'[Tue, May 15, 2018 11:42:36 AM] /c/Program Files (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --font=Arial --outputbase=/tmp/font_tmp.CpgpM0lbxD/sample_text.txt --text=/tmp/font_tmp.CpgpM0lbxD/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxDRendered page 0 to file C:/Users/asus/AppData/Local/Temp/font_tmp.CpgpM0lbxD/sample_text.txt.tif=== Phase I: Generating training images ===Rendering using ArialRendering using Corbel[Tue, May 15, 2018 11:42:37 AM] /c/Program Files (x86)/Tesseract-OCR/text2image --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.6m4B2TUln1/eng/eng.Arial.exp0 --max_pages=3 --font=Arial --text=./langdata/eng/eng.training_text[Tue, May 15, 2018 11:42:37 AM] /c/Program Files (x86)/Tesseract-OCR/text2image --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0 --max_pages=3 --font=Corbel --text=./langdata/eng/eng.training_textStripped 2 unrenderable wordsRendered page 0 to file C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.tifStripped 1 unrenderable wordsRendered page 1 to file C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.tifStripped 2 unrenderable wordsRendered page 0 to file C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.tifStripped 1 unrenderable wordsRendered page 1 to file C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.tif=== Phase UP: Generating unicharset and unichar properties files ===[Tue, May 15, 2018 11:42:39 AM] /c/Program Files (x86)/Tesseract-OCR/unicharset_extractor --output_unicharset /tmp/tmp.6m4B2TUln1/eng/eng.unicharset --norm_mode 1 /tmp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.box /tmp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.boxExtracting unicharset from box file C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.boxExtracting unicharset from box file C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.boxICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset does not exist or is not readable###### MAKING EVAL DATA ######=== Starting training for language 'eng'[Tue, May 15, 2018 11:42:40 AM] /c/Program Files (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --font=Calibri --outputbase=/tmp/font_tmp.n0qq4iJk4q/sample_text.txt --text=/tmp/font_tmp.n0qq4iJk4q/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.n0qq4iJk4qRendered page 0 to file C:/Users/asus/AppData/Local/Temp/font_tmp.n0qq4iJk4q/sample_text.txt.tif=== Phase I: Generating training images ===Rendering using Calibri[Tue, May 15, 2018 11:42:40 AM] /c/Program Files (x86)/Tesseract-OCR/text2image --fontconfig_tmpdir=/tmp/font_tmp.n0qq4iJk4q --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0 --max_pages=3 --font=Calibri --text=./langdata/eng/eng.training_textStripped 2 unrenderable wordsRendered page 0 to file C:/Users/asus/AppData/Local/Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.tifStripped 1 unrenderable wordsRendered page 1 to file C:/Users/asus/AppData/Local/Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.tif=== Phase UP: Generating unicharset and unichar properties files ===[Tue, May 15, 2018 11:42:42 AM] /c/Program Files (x86)/Tesseract-OCR/unicharset_extractor --output_unicharset /tmp/tmp.h0l64TAxEq/eng/eng.unicharset --norm_mode 1 /tmp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.boxExtracting unicharset from box file C:/Users/asus/AppData/Local/Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.boxICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.h0l64TAxEq/eng/eng.unicharset does not exist or is not readable#### combine_tessdata to extract lstm model from previous trained set ####Extracting tessdata components from ./tessdata_best/eng.traineddataWrote ./trained_plus_chars/eng.lstmVersion string:4.00.00alpha:eng:synth2017062917:lstm:size=401636, offset=19218:lstm-punc-dawg:size=4322, offset=40182819:lstm-word-dawg:size=3694794, offset=40615020:lstm-number-dawg:size=4738, offset=410094421:lstm-unicharset:size=6360, offset=410568222:lstm-recoder:size=1012, offset=411204223:version:size=30, offset=4113054#### training from previous optimum #####finetune.sh: line 119: 11664 Segmentation fault lstmtraining --model_output $train_output_dir/pluschars --continue_from $train_output_dir/$Lang.lstm --old_traineddata $tessdata_dir/$Lang.traineddata --traineddata $train_output_dir/$Lang/$Lang.traineddata --max_iterations $MaxIterations --debug_interval -1 --eval_listfile $eval_output_dir/$Lang.training_files.txt --train_listfile $train_output_dir/$Lang.training_files.txt#### Building final trained file ./trained_plus_chars/eng_NEW.traineddata d####finetune.sh: line 130: 11320 Segmentation fault lstmtraining --stop_training --continue_from $train_output_dir/pluschars_checkpoint --traineddata $train_output_dir/$Lang/$Lang.traineddata --model_output $final_trained_data_file
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7c46c196-e08d-4541-9f3b-b8a768792c9a%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7c46c196-e08d-4541-9f3b-b8a768792c9a%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3851abc9-90b5-4a09-a01f-ffbd583e6bab%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/885e3e15-e08f-4489-a0bc-2162f913495a%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e43db8d0-731e-4268-8791-9e243646f49d%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fe15cedc-0a2a-41fc-ac3c-b80df458a509%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1ee9528e-d8fd-4438-9cd0-4925ae7763d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/72b70562-15f4-4b6f-96a9-62b6d792980c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAB_aDAdG7wKs-U9fhvuf3FZdFGs2--0qHW1Bfzr%2BinrPnZ3Ovg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8eafa0fa-6129-4c87-a53b-ae8a5659ae79%40googlegroups.com.
training/lstmtraining --model_output /path/to/output [--max_image_MB 6000] \
--continue_from /path/to/existing/model \
--traineddata /path/to/original/traineddata \
[--perfect_sample_delay 0] [--debug_interval 0] \
[--max_iterations 0] [--target_error_rate 0.01] \
--train_listfile /path/to/list/of/filenames.txt
In this command, what should be passed to the argument continue_from and traineddata? I'm a bit confused.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d374762e-28e2-4118-847f-edec3065b3a8%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX-km279eFQ%3D0Lx-63E5AoUoYerdha6GKenZ15Fcs%2BvrA%40mail.gmail.com.