I think this happens when the complex characters in your training text are not part of the original Korean Unicharset that the 4.00.00alpha kor.traineddata was trained with. Do 'replace top layer' training instead of finetune. @abhishekchopde has had good results with it - see #1009 It will take longer than finetuning. Hi shree I have a question ... you uploade this passage . But this link is not right . plz check again |
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2878cbf6-a064-4fe5-ab5c-cfcd54248e9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d94d0cc3-79f0-4a6e-9cee-92b616424459%40googlegroups.com.
https://github.com/tesseract-ocr/tesseract/issues/549
@harinath141 If you are getting a lot of these errors during finetune, try replace top layer training. You can use the box/tiff pairs generated for finetune. Commands will be similar to the following:
mkdir -p ~/tesstutorial/tellayer_from_tel
combine_tessdata -e ../tessdata/tel.traineddata \
~/tesstutorial/tellayer_from_tel/tel.lstm
lstmtraining -U ~/tesstutorial/tel/tel.unicharset \
--script_dir ../langdata --debug_interval 0 \
--continue_from ~/tesstutorial/tellayer_from_tel/tel.lstm \
--append_index 5 --net_spec '[Lfx256 O1c105]' \
--model_output ~/tesstutorial/tellayer_from_tel/tellayer \
--train_listfile ~/tesstutorial/tel/tel.training_files.txt \
--target_error_rate 0.01
I found the article you wrote
but --script_dir doesn't work in the lstmtraining ?
How do I change this option(flag) ??? what is replaced by that phrase
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7ba3c6fe-c66d-428d-95ee-aed8e149c6b9%40googlegroups.com.