Hope below information helps: :)
Create box files: tesseract /path/to/image.tif path/and/nameof/boxfile/imgae lstmbox
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7bd32ea2-3af3-44e0-8c54-753ca6dd1f90%40googlegroups.com.
Alternately you can use wordstrbox config file.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/39c0ff88-abe7-424c-bede-5d86ef0377fb%40googlegroups.com.
lstmbox creates character level box files.Wordstrbox creates line level box files.If using wordstrbox, please use the groundtruth text for creating unicharset instead of the box files.
On Thu, May 28, 2020, 20:49 Владимир Калачихин <v.kala...@gmail.com> wrote:
--
четверг, 28 мая 2020 г., 16:36:14 UTC+3 пользователь shree написал:Alternately you can use wordstrbox config file.What is "wordstrbox config file"?
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
I don't quite understand You.Could you give us an example of use tesseract to create wordstrbox, and use combine_lang_model with groundtruth text?
--четверг, 28 мая 2020 г., 18:21:31 UTC+3 пользователь shree написал:lstmbox creates character level box files.Wordstrbox creates line level box files.If using wordstrbox, please use the groundtruth text for creating unicharset instead of the box files.On Thu, May 28, 2020, 20:49 Владимир Калачихин <v.kala...@gmail.com> wrote:--
четверг, 28 мая 2020 г., 16:36:14 UTC+3 пользователь shree написал:Alternately you can use wordstrbox config file.What is "wordstrbox config file"?
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/39c0ff88-abe7-424c-bede-5d86ef0377fb%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/80b3e39d-d0e9-4fce-b827-e39d65ac3dbd%40googlegroups.com.
Input Filesmyfile1.pngmyfile1.gt.txt
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a0e7f1ca-b8cc-4752-b622-8e4e99f953af%40googlegroups.com.
Ok, I want to train from training text and fonts.Whats method must be?
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ca08f76d-d4d4-4e48-985c-c9c2cc00f8e6%40googlegroups.com.
Use tesstrain.sh or tesstrain.pyOn Sun, May 31, 2020 at 6:45 PM Владимир Калачихин <v.kala...@gmail.com> wrote:Ok, I want to train from training text and fonts.Whats method must be?
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f9ee8e10-e789-442a-ac21-0c9aa14391bd%40googlegroups.com.
### create tif and box using fonts and training texttext2image --fonts_dir=/home/ubuntu/.fonts --outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont --text=../langdata/mylang/mylang.training_text
### create unicharset from training_textunicharset_extractor --norm_mode 1 --output_unicharset ./output/folder/own.unicharset ../langdata/mylang/mylang.training_text
### Create starter traineddatda (aka recoder):combine_lang_model --input_unicharset ./out/own.unicharset --script_dir ./langdata --output_dir ./out --lang mylang
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/77f10ba4-83cb-45ba-8f6c-17b42f313336%40googlegroups.com.
This is for Latin script not Latin language.wget the file from https://github.com/tesseract-ocr/langdata_lstm/blob/master/Latin.unicharset
### Train:
lstmtraining .....
This is what is missing : --net_spec . Check the line below that I mentioned before.lstmtraining --traineddata ./out/own/own.traineddata --model_output ./output/own --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c110]" --train_listfile ./eng_ltsm/eng.training_files.txt --eval_listfile ./eng_ltsm/eng.training_files.txt --max_iterations 100
###Create Final traineddata:lstmtraining --stop_training --continue_from ./output/ mylang _checkpoint --traineddata ./out/mylang /mylang.traineddata --model_output ./output/mylang.traineddata