I am working with Tesseract OCR and want to experiment with different binarization methods, such as Otsu's thresholding and other custom filters, to improve text recognition accuracy.
However, I am concerned that training with these different preprocessing techniques might modify or overwrite eng.traineddata, which I want to keep intact.
My questions are:
Does training a new model affect the existing eng.traineddata file? How can I safely train Tesseract with new filters without modifying the default English model? Is there a recommended approach to train Tesseract on preprocessed images while keeping eng.traineddata unchanged?What I've tried:
updated my current eng_new.traineddata with three samples, each sample had applied filter Otsu, Otsu_Tresh_Binary, Otsu_Tresh_Binary_Inv After first 1000 iterations I got difference between initial and target trained.data But target trained.data got slightly worse results.
lstmtraining --continue_from /home/j/trainingCurrentEng/data/checkpoints/eng_trained --traineddata /home/j/trainingCurrentEng/data/eng.traineddata --train_listfile /home/j/trainingCurrentEng/data/list.train --eval_listfile /home/j/trainingCurrentEng/data/list.eval --model_output /home/j/trainingCurrentEng/data/checkpoints/eng_trained --learning_rate 0.0001 --debug_interval 10 --max_iterations 600 tesseract otsu_tresh_binary_inv.tiff output_text -l eng --tessdata-dir /home/j/trainingCurrentEng/data --psm 7cat output_text.txt
Abcd123
tesseract otsu_tresh_binary_inv.tiff output_text_1 -l eng_trained --tessdata-dir /home/j/trainingCurrentEng/data --psm 7cat output_text_1.txt Abc
I would appreciate any guidance or best practices for training custom models without interfering with existing ones.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/2661c455-a141-4398-9542-10321a319510n%40googlegroups.com.
Can we hold an online meeting with a general invitation to those interested to discuss how to do this?
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLxOAswRkmqkEtzmHcCWxipDh78xx2J4WMJe-TD68NAw3g%40mail.gmail.com.
Yes, please message me with whatsap
+66820510893
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAB5aXsnWZ75YFHP7Upa9GoM8LMrXo18bEm9p95wX9rZLcfgRoA%40mail.gmail.com.