lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \
--traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \
--eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 2>&1 |
grep ±
to check and ± only shows up in Truth but not in OCR
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
How big was your training text? How many iterations? Did the fonts you use for training support the plus minus sign?You can run training with -- debug-level of -1 so that you can see whether the plus minus is being picked for training in the console messages.
On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejo...@gmail.com> wrote:
Thanks. It works. The new character I added was there.--Do you have any idea why after fine tuning tesseract still couldn't recognize the new character I added? When I tried to add '±' to eng it works, but when I tried to add '±' to chi_sim, it couldn't work (explained below). Is there anything we need to pay attention to when fine tuning other langs rather than eng?I usedlstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \ --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \ --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 2>&1 | grep ±
to check and ± only shows up in Truth but not in OCR
在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道:combine_tessdata -u new.traineddata new.will unpack the traineddata file. check new.lstm-unicharset in it
On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote:I tried to fine tune the model and add a new character via training, but it seems it still couldn't recognize this new character using the new traineddata generated. To debug I want to check whether this new character is in the .unicharset in the new traineddata generated. Is there a way to do this?
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
How big was your training text? How many iterations? Did the fonts you use for training support the plus minus sign?
You can run training with -- debug-level of -1 so that you can see whether the plus minus is being picked for training in the console messages.
On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejo...@gmail.com> wrote:
Thanks. It works. The new character I added was there.--Do you have any idea why after fine tuning tesseract still couldn't recognize the new character I added? When I tried to add '±' to eng it works, but when I tried to add '±' to chi_sim, it couldn't work (explained below). Is there anything we need to pay attention to when fine tuning other langs rather than eng?I usedlstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \ --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \ --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 2>&1 | grep ±
to check and ± only shows up in Truth but not in OCR
在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道:combine_tessdata -u new.traineddata new.will unpack the traineddata file. check new.lstm-unicharset in it
On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote:I tried to fine tune the model and add a new character via training, but it seems it still couldn't recognize this new character using the new traineddata generated. To debug I want to check whether this new character is in the .unicharset in the new traineddata generated. Is there a way to do this?
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
How big was your training text? How many iterations? Did the fonts you use for training support the plus minus sign?
You can run training with -- debug-level of -1 so that you can see whether the plus minus is being picked for training in the console messages.
On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejo...@gmail.com> wrote:
Thanks. It works. The new character I added was there.--Do you have any idea why after fine tuning tesseract still couldn't recognize the new character I added? When I tried to add '±' to eng it works, but when I tried to add '±' to chi_sim, it couldn't work (explained below). Is there anything we need to pay attention to when fine tuning other langs rather than eng?I usedlstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \ --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \ --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 2>&1 | grep ±
to check and ± only shows up in Truth but not in OCR
在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道:combine_tessdata -u new.traineddata new.will unpack the traineddata file. check new.lstm-unicharset in it
On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote:I tried to fine tune the model and add a new character via training, but it seems it still couldn't recognize this new character using the new traineddata generated. To debug I want to check whether this new character is in the .unicharset in the new traineddata generated. Is there a way to do this?
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f408c974-aa0b-4df9-a364-d1f0ca2a8a1c%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
If you increase the iterations then the plus type of training will not give good result, i.e. the other letters will lose accuracy.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6d299e90-fc12-4a52-989f-5b787db5f1f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6d299e90-fc12-4a52-989f-5b787db5f1f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d5d4c267-c6e4-41e6-b0ab-01391a1b666d%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1a993e08-1444-4791-a8b7-981c6ba0cdbd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
lstmtraining --stop_training \
--continue_from ~/tesstutorial/eng_from_chi/base_checkpoint \
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
--model_output ~/tesstutorial/eng_from_chi/eng.traineddata
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1a993e08-1444-4791-a8b7-981c6ba0cdbd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.