I'm trying to configurate tesseract to recognize alphanumeric strings of 10 characters long (all uppercase).
This works pretty good, except it seems to mix up the following characters pretty often:
2
and Z
6
and G
Examples of images are:
These are preprocessed. I think this process was successfully done. I'll glad to hear otherwise.
This is how I run Tesseract:
tesseract = new Tesseract();
tesseract.setOcrEngineMode(TessAPI.TessOcrEngineMode.OEM_TESSERACT_ONLY);
tesseract.setPageSegMode(7);
tesseract.setTessVariable("load_system_dawg", "0");
tesseract.setTessVariable("load_freq_dawg", "0");
tesseract.setTessVariable("load_punc_dawg", "0");
tesseract.setTessVariable("load_number_dawg", "0");
tesseract.setTessVariable("load_unambig_dawg", "0");
tesseract.setTessVariable("load_bigram_dawg", "0");
tesseract.setTessVariable("load_fixed_length_dawgs", "0");
tesseract.setTessVariable("classify_enable_learning", "0");
tesseract.setTessVariable("classify_enable_adaptive_matcher", "0");
tesseract.setTessVariable("segment_penalty_garbage", "0");
tesseract.setTessVariable("segment_penalty_dict_nonword", "0");
tesseract.setTessVariable("segment_penalty_dict_frequent_word", "0");
tesseract.setTessVariable("segment_penalty_dict_case_ok", "0");
tesseract.setTessVariable("segment_penalty_dict_case_bad", "0");
Note that this is Java code, but my question is not limited to Java.
I am not really experienced with Tesseract and seem to find the documentation very unclear. I hope someone else can help me out.
To give some more context:
How do I train Tesseract?
I train Tesseract by combining over 200 images into one image. Every image contains 10 alphanumeric characters. Also, I am sure the box file is correct.
I build the final language by executing the following batch script:
tesseract qwe.combined.jpg qwe.combined.box nobatch box.train
echo combined 1 0 0 0 0 > font_properties
unicharset_extractor qwe.combined.box
shapeclustering -F font_properties -U unicharset qwe.combined.box.tr
mftraining -F font_properties -U unicharset -O qwe.unicharset qwe.combined.box.tr
cntraining qwe.combined.box.tr
copy inttemp qwe.inttemp
copy normproto qwe.normproto
copy pffmtable qwe.pffmtable
copy shapetable qwe.shapetable
combine_tessdata qwe.
How can I make Tesseract discriminate better between the 2
, Z
, 6
and G
?
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bba1f122-6bb2-43f6-9a7d-9daa75f5323e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c57fdcb6-10df-4ad9-9822-c0dc46c9ccde%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/07f632a9-a4d5-4234-8478-ec1a22bbd5da%40googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/Hr79AmtApeA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAORW5viJjTtp3%2B7RcAsw92L4eCco-3yRQK_qiKZPmzCKrAjiLA%40mail.gmail.com.